geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
432 stars 489 forks source link

Allow deploying a Datahub instance (GeoNetwork-UI) from the administration interface #8021

Open jahow opened 6 months ago

jahow commented 6 months ago

This proposal aims at integrating the Datahub application provided by GeoNetwork-UI into all standard deployments of GeoNetwork.

The integration will be done as follows:

Administration interface

Main catalog

A new option will be added in the Administration > Settings > System settings page. This option will be composed of:

Image

Subportals

A "enable Datahub" option similar as the one above will be added for each sources in the Administration > Settings > Sources page.

Theme Editor

Both options described above will offer a "theme editor" button. The Datahub Theme Editor will be a separate application provided by GeoNetwork-UI which will allow users to edit in real-time a configuration file for the datahub (for instance URLs, theme colors, fonts, map options etc.).

Image

The Theme Editor application will be packaged as a web component and, when clicking "Save", the resulting configuration will simply be transferred into a hidden text field in the settings form.

Using the Theme Editor will be optional; if left untouched, the default configuration will be used.

Where are configurations stored?

The main catalog Datahub configuration will be stored alongside other system settings as a large text field.

Configurations for subportals will be stored in the sources table as a large text field.

How will the Datahub be deployed alongside GeoNetwork?

The Datahub application, once compiled, is a collection of static files (HTML, JS, CSS) that can be served as is. These files (2Mo in total) will be bundled inside the GeoNetwork WAR package.

Bundling the application

A maven task will be added to

  1. clone the geonetwork-ui repository on a specific version
  2. build the Datahub application
  3. copy the resulting files in the static resources of the WAR package

The version of GeoNetwork-UI used will be set in the maven properties. Most likely it will be increased alongside GeoNetwork versions. We do not expect significant breaking changes on the configuration format, but if that happens, the Theme Editor can probably assist the user in migrating their configurations.

Serving the application

Because the Datahub application can be accessed in several ways, a Java service will have to be developed and will handle incoming requests to the Datahub.

Main catalog Datahub

Once enabled, the main Datahub will be accessible on /geonetwork/srv/datahub. The configuration defined in the System settings will be used.

Subportals Datahub

Once enabled, a subportal Datahub can be accessed with /geonetwork/subportal-name/datahub. The configuration defined in the subportal settings will be used.

Impacts

This proposal is expected to have many positive impacts:

The technical impacts on the GeoNetwork project are:

Update on 12.07.2024

The following things related to the Datahub will not be part of the main build of GeoNetwork. Instead, when using the -Drelease flag (similar to https://github.com/geonetwork/core-geonetwork/pull/7302), a geonetwork-datahub-integration.jar file will be produced. This file can then be put in the lib folder of the main webapp to enable these functionalities.

Things that will still be part of the main GeoNetwork build are:

Voting

PSC Support:

ticheler commented 6 months ago

Hi @jahow, Thanks for a great proposal and for your hard work on this!

I would very much appreciate organising a session with the PSC where you demonstrate a working prototype of such integration for everyone to have a better understanding. Could you plan for that?

Another thing I feel is important to talk about at this stage is about building on and contributing to the GN-UI project. The mono repository could be something of concern in that respect, but I am also not sure what the current status of that is. Maybe that requirement has already been mitigated?

Cheers! Jeroen

jahow commented 6 months ago

Thanks for the feedback @ticheler 🙂 yes, showing a working prototype sounds like an excellent idea. We can probably organize that for the next PSC meeting if that's ok?

I'm not certain I see what you mean by concerns regarding the monorepo. I remember we had discussions about the project complexity being a obstacle to contributing more, is that it?

Looking forward to see that come to life!

edevosc2c commented 6 months ago

Hello, I'm writing this message about the technical side of these proposals.

First of all, I actually deployed in the past "subportals" datahub for Geocat.ch. In simple terms, I deployed multiple containers linked to different datahub configurations.
Each datahub were accessible from different "subpaths": /datahub/thurgau, /datahub/viageo. My main problem was that the docker image wasn't made for this kind of use case, I had to use workarounds for it to work.

But it wasn't too far from a proper implementation of this feature. There might need some tweaks to have something that is properly production ready.

About "main catalog" feature

My main concern is that if we go the direction to include datahub into the geonetwork "program". This is yet another component integrated into geonetwork. This goes against the standard of today world, where we try to have microservices in order to improve the scability of our program and make them more resilient to failures.

Actually, in geOrchestra, we have a hard time integrating geonetwork in this microservice architecture. It's not possible to have multiple instances of geonetwork for redundancy, and it's hard to move geonetwork around multiples servers in case one server fail.

That's why we, at geOrchestra, liked the ability to have datahub as a separate program. It's a "stateless" application, so it is much easier to manage.

The perks are:

I'm fine with optionally being able to deploy datahub from geonetwork, but I would like to request to still have the ability to deploy it separately.

About "subportals" feature

Like I said in my first paragraph, technically deploying subportals of datahub is possible today. It's just that it's cumbersome and "hacky".

I would be interested to improve the current Docker image for datahub in order to have a proper deployment of separate datahub that counts as multiple subportals. Obviously, only the system administrator would be able to deploy subportals in this case.

I'm not fond to let the user deploy their own subportals by himself because it creates additional architectural difficulties: unknown possible CORS issues (if usage external geonetwork), not being in control of what kind of datahub is created and having to persist in the filesystem the user configuration of all the different subportals configuration. But this highly relates to not liking the idea to have datahub inside geonetwork.

About "Theme Editor" feature

Same issue as having datahub packaged into geonetwork, this component will have difficulties being scalable and will have to be tied to the same "filesystem" as geonetwork.

Final notes about all the proposals

For normal users, local installation or low traffic instance of geonetwork, what I said previously are non-issue. They are most likely deploying geonetwork on one server, and it's perfectly ok to have these features in order to improve their workflow.

But I wanted to give my opinion for the people that deploy it on platforms that receive a lot of traffic and need to be "always available".

jahow commented 6 months ago

Thanks @edevosc2c for the shared knowledge.

My main problem was that the docker image wasn't made for this kind of use case, I had to use workarounds for it to work.

Let's keep in mind that we're not specifically talking here about a docker context. Actually one of the motivations of this proposal is to let people using GeoNetwork as a standalone WAR also benefit from GeoNetwork-UI.

I'm fine with optionally being able to deploy datahub from geonetwork, but I would like to request to still have the ability to deploy it separately.

This proposal will probably change almost nothing on GeoNetwork-UI side. Maybe a few adaptations to make deployment more flexible if necessary, but that's it.

unknown possible CORS issues (if usage external geonetwork), not being in control of what kind of datahub is created and having to persist in the filesystem the user configuration of all the different subportals configuration.

There should be no CORS concerns here since the Datahub will be accessed on the same host as GeoNetwork (e.g. http://localhost:8080/geonetwork/srv/datahub). As for configurations, they will be modifiable by hand by the user, but the theme editor should offer some kind of fool-proofing to make sure that configurations are still valid.

ticheler commented 5 months ago

Actually, in geOrchestra, we have a hard time integrating geonetwork in this microservice architecture. It's not possible to have multiple instances of geonetwork for redundancy, and it's hard to move geonetwork around multiples servers in case one server fail.

Thanks for all feedback you provided. I'm curious what would be required from your perspective to make GeoNetwork easier to be deployed according to you? (I know this does not strictly relate to this topic)

edevosc2c commented 5 months ago

@ticheler Thanks for your comment, I have created a dedicated GitHub discussion for that: https://github.com/orgs/geonetwork/discussions/8167

jahow commented 4 months ago

The body of the proposal was just updated to reflect the approach suggested by the PSC, which mimics the one used for S3, CMIS file storage providers in https://github.com/geonetwork/core-geonetwork/pull/7302.

jahow commented 4 months ago

@geonetwork/project-steering-committee would it be possible to get some concrete feedback on this proposal? It's been around for 2.5 months now and a definitive answer would be very much appreciated from all parties involved.

josegar74 commented 4 months ago

@jahow sorry about the late feedback. Probably most people on holidays or finishing stuff before going on holidays.

Thanks for updating the proposal as discussed in the last PSC meeting. It looks good to me.

jahow commented 4 months ago

@geonetwork/project-steering-committee hi again! I really would like to have at least some kind of feedback on this proposal now, it's been in limbo for so many weeks and we simply cannot expect our financial sponsors to stay in complete uncertainty about this forever. I understand that it's holidays time now but this proposal was written beginning of May, so not exactly summer time.

According to https://github.com/geonetwork/core-geonetwork/wiki/Project-Steering-Committee a proposal is supposed to be reviewed in 2 working days.

Again, could the PSC please let know of their stance on this proposal? And could @ticheler clarify what can we, as Camptocamp, tell our sponsors about this proposal? Thank you very much to all involved.

fgravin commented 3 months ago

Dear @jahow

Thanks for updating the proposal, as required by the PSC during the last PSC meeting. The PSC prefers to have this option as a plugin for now, and your changes reflect that consideration.

Many GeoNetwork instances already use the Datahub, and I think it will benefit many other instances to integrate this plugin so it's definitively a +1 for me. This feature is highly expected by a large part of the community, it's a great work and it's very promising.

+1

ticheler commented 3 months ago

Dear @jahow, Indeed thanks for updating the proposal and related implementation! I was on holidays, so sorry for the late response. I've given a +1 on the proposal. Jose did the same but it was not persisted it looks like. Cheers, Jeroen

josegar74 commented 3 months ago

@jahow +1 for me, I see I didn't mention that explicitly in https://github.com/geonetwork/core-geonetwork/issues/8021#issuecomment-2244347190

jahow commented 3 months ago

wonderful, thank you both so much!

fxprunayre commented 3 months ago

+0.

Where are configurations stored? The main catalog Datahub configuration will be stored alongside other system settings as a large text field. Configurations for subportals will be stored in the sources table as a large text field.

The source table always contains a line with the main catalogue which is flagged with type=portal so maybe can make sense to have all config stored in the same place.

image

jahow commented 3 months ago

+0.

Where are configurations stored? The main catalog Datahub configuration will be stored alongside other system settings as a large text field. Configurations for subportals will be stored in the sources table as a large text field.

The source table always contains a line with the main catalogue which is flagged with type=portal so maybe can make sense to have all config stored in the same place.

image

Thank you for your feedback! yes indeed, it could be simpler to have all datahub configs showing up in the Sources admin UI. Good point!

jodygarnett commented 3 weeks ago

I wonder if we can make something extensible:

  1. So that a bundle of static content can be included in the war as a build option
  2. Configuration added to the admin console
  3. Enable / disable the content

If we had a general mechaism it could handle the option to bundle geonetwork-ui and embedded online help (which is presently a build option).

Chances are the enable/disable would need to:

jahow commented 3 weeks ago

That would be an interesting development and a nice goal for GN5. Making something more generic will make things more complicated though: the POC that was done for this proposal had to make a few assumptions about the frontend app it was serving, especially the configuration file.