GeoNode / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.

https://geonode.org/

Other

1.45k stars 1.13k forks source link

GNIP 97: New metadata editor #11511

Open etj opened 1 year ago

etj commented 1 year ago

GNIP - New metadata editor

Overview

The metadata editor still relies on a legacy template engine. It should be migrated to use newer libraries

Proposed By

### Assigned to Release This proposal is for GeoNode ???. ### State * [x] Under Discussion * [ ] In Progress * [ ] Completed * [ ] Rejected * [ ] Deferred ### Motivation - Remove old unmaintained vulnerable js libraries - Allow easier customization ## Proposal *TODO* ### Backwards Compatibility *TODO* ## Future evolution *TODO* ## Feedback *TODO* ## Voting Project Steering Committee: * Alessio Fabiani: * Francesco Bartoli: * Giovanni Allegri: * Simone Dalmasso: * Toni Schoenbuchner: * Florian Hoedt: ## Links Remove unused links below. * [Email Discussion]() * [Pull Request]() * [Mail Discussion]() * [Linked Issue]()

etj commented 1 year ago

Some comments from another issue referencing this topic:

https://github.com/GeoNode/geonode/issues/11494#issuecomment-1723335481 by @giohappy

@gannebamm at the moment the idea is just to implement the simple relationship model. We can extend it to include additional properties in the future, but this will require also extending the editor. This is something I wouldn't do until the metadata editor has not been migrated to the new client. I have almost completed the design of a proposal for the new metadata editor, but no timeline at the moment since.
https://github.com/GeoNode/geonode/issues/11494#issuecomment-1725079088 by @gannebamm

Thünen is currently looking at ISO-19115 metadata fields not yet covered in the GeoNode metadata schema. We will need to extend the wizard. Also, #10290 // #10367 will require changes in the wizard. If possible, we could try to work together on the new metadata editor, and you could get some feedback on its developer experience. From our side, for example, @ahmdthr and @ridoo are part of the development team. What do you think?

mwallschlaeger commented 1 year ago

Thanks for the proposal @gannebamm . I just wanted to add some of my thoughts on this topic to the discussion. First of all I want to give some impression on the current interaction with metadata inside of geonode. As geonode ist a project that is growing for long time it has a long history of internal APIs. In the current implementation the metadata values are accessed through Django methods like Querrysets, and Forms and since some time also by the django-rest-api including their serializers and Viewsets. Therefore when adding new fields to the metadata model the following parts need editing:

rest v2 API (serializer, Viewset)
Django Forms need to be created in order to edit the metadata in the metadata editor
and if somewhere else metadata values got accessed they can use querrysets
django templates

This different methods also require different validation strategies which require some maintenance when editing the metadata model.

Therefore, as we now have a more or less robust rest API, I would suggest to only use the API for interaction with the metadata model. Additionally the base.forms and templates could be removed, and a lot of code needs to be edited to make metadata interaction generally API based. But afterwards, there would be place for a new metadata editor only using rest v2 to interact with geonode django-rest-api backend. This would also allow wider possibilities for geonode metadata editors and would make space for geonode project allow different templates for the metadata editor like (slim, medium, ISO19115:2003).

Still if somebody requires more or other fields for metadata, she need to change the database model, rest API and UI and something not mentioned yet, the pyCSW interaction. For the pyCSW interaction I go definitely with the issue https://github.com/GeoNode/geonode/issues/9147 which would make it easier to bring changes to metadata model to the CSW API.

giohappy commented 1 year ago

@gannebamm @mwallschlaeger The high level design I'm planning for the rewrite of the metadata editor is the following:

implement a model for the configuration of the metadata editor panels that the client will use to build the new editor. This model will include at least the following:
- fieldsets (that can be rendered as panels, blocks of widgets, tabs, etc.)
- fieldsets:
- fields: list of field configurations
- name: name of the fieldset
- field configuration:
- label
- name
- widget type (select, textarea, text input, date, etc.)
- enums: either a fixed list of values or the API endpoint from where the list can be retrieved dynamically (e.g. categories, regions, etc.)
- cardinality: 0..n, 1, 1..n, etc.
- advanced: true/false

As you will notice, this configuration is very similar to what we have for the new filters and facets API.

This configuration could be initially serialized (JSON) and served to the client statically. It could be overridden inside a project, similar to the geocode config overrides. A second step could be to make the configuration build dynamically and define an interface for modules to register additional fieldsets, fields, and field configurations dynamically.

The goal is to:

make configurations, APIs, and the client for the metadata editor converge on what we already do for filters, avoiding duplicated models and configuration schemas where possible
make it generic enough to be able to implement other custom forms, beyond the metadata editor
merge the Wizard / Advanced visualizations into a single editor, with fieldsets organized in collapsible panels (for example) and with togglable advanced fields

Ok, this is a very rough description. I wanted to take more time to write it down, but given the WIP GNIP has been created already I think it was useful to share it right away.

mwallschlaeger commented 1 year ago

@giohappy thanks for sharing this idea with us. I was spinning my head around this for some time the recent days. If i understand you correct this is only related to the UI part, so the JSON would not configure any API endpoints or database columns behind the API right?

I think generally your idea could be a good approach to make it easier to edit metadata editor in geonode. But for some projects i was working on in the past on which I applied the same approach, I ended up with blowing and extending the JSON definition more and more because the definitions are not fine grained enough. And when I think about my latest experiences with the widgets and forms in geonode I guess it's going to be very hard to implement all the aspects of this into a simple JSON model. I still like the idea and approach to make it more easy to edit the metadata editor view.

gannebamm commented 1 year ago

Just a quick mention of some work done by @t-book in that regard: https://github.com/csgis/custom_metadata

t-book commented 1 year ago

This would also mean implementing the metadata editor with React as well, right? If so, I am rather skeptical in terms of complexity. ( Based on experiences with the development with mapstore2 ). But I would wait for the final GNIP.

gannebamm commented 1 year ago

@t-book What other alternatives would be maintainable? One of the goals of @giohappy is

make configurations, APIs, and the client for the metadata editor converge on what we already do for filters, avoiding duplicated models and configuration schemas where possible

Which implies using mapstore2 as the framework.

Even though I also see mapstore2 developing as a daunting and complex task. However, I am no front-end developer; therefore, this is heavily biased and likely due to inexperience.

ridoo commented 1 year ago

@giohappy as far I can see, the API you have in mind would provide client instructions how to render widgets based on metadata which is might be configurable on the backend side.

Do you have any reasons to include such a coupling between REST API and client? When it comes to metadata, I would have expected the client to make sense of the metadata for its own scope (a client does not always render things).

You already have an OpenAPI description available. Did you accounted this already, and why not to extend metadata this way? Even JSON schema might be an option (but I have to admit not to thought about this option deeper, yet).

If "coupling" gets necessary, maybe configurable layouting on the client side can be an option.

giohappy commented 1 year ago

Do you have any reasons to include such a coupling between REST API and client? When it comes to metadata, I would have expected the client to make sense of the metadata for its own scope (a client does not always render things).

It depends on what you mean by coupling. My proposal is to have a way to tell the client what we want to edit, and how to lay it out (given a list of available widgets/components). This configuration must be dynamic because we want projects to extend and/or modify it. Or even, in the future, create custom forms beyond the metadata editor.

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

You already have an OpenAPI description available. Did you accounted this already, and why not to extend metadata this way? Even JSON schema might be an option (but I have to admit not to thought about this option deeper, yet).

Here we're talking of something different. We want a model to describe forms, I don't see how this can be covered by the current API.

This would also mean implementing the metadata editor with React as well, right? If so, I am rather skeptical in terms of complexity. ( Based on experiences with the development with mapstore2 ). But I would wait for the final GNIP.

React doesn't imply MapStore2. We can see if MapStore2's routing and state management can help or not here, but we're not assuming that we MUST use MapStore2. On the other hand, we promote React2 to avoid using vanilla Javascript for such an important and complex component and to harmonize the client components. Of course, this can be discussed with the people that will work on this. What we really care is avoid Frankesteins :)

We're still fighting with the plethora of approaches in the legacy templates. Parts written with Angular, others with jQuery + dozens of unmaintained libraries.

ridoo commented 1 year ago

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

Seems, that I thought you wanted to add this to the GeoNode API (/api/v2). In general, I agree, that geonode-mapstore-client has lot of coupling with the UI .. it actually provides the UI (with some GeoNode integration). Here, such API would be better located than in the GeoNode API of course.

However, until now I have thought of the backend implementation of geonode-mapstore-client is a temporary approach on the path to completely separate frontend from backend. Adding REST API functionality in the contrib module here, feels a bit like cheating :smile:, but I may not see all the parts playing a role here. Do you have more strategic insights how geonode-mapstore-client will evolve (a link would also be fine)?

gannebamm commented 1 year ago

@t-book

This would also mean implementing the metadata editor with React as well, right? If so, I am rather skeptical in terms of complexity. ( Based on experiences with the development with mapstore2 ).

@giohappy

React doesn't imply MapStore2. We can see if MapStore2's routing and state management can help or not here, but we're not assuming that we MUST use MapStore2. On the other hand, we promote React2 to avoid using vanilla Javascript for such an important and complex component and to harmonize the client components. Of course, this can be discussed with the people that will work on this. What we really care is avoid Frankesteins :)

@giohappy I completely understand the need to converge on one main frontend framework and to get rid of old loose ties (like angular stuff). Would you @t-book think that a 'plain' React-based metadata editor would be easier to extend/develop for, or is this still to much overhead in your opinion?

@giohappy

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

@ridoo

[...] Here, such API would be better located than in the GeoNode API of course.

However, until now I have thought of the backend implementation of geonode-mapstore-client is a temporary approach on the path to completely separate frontend from backend. Adding REST API functionality in the contrib module here, feels a bit like cheating 😄, but I may not see all the parts playing a role here.

What is the worst case in using the geonode-mapstore-client django app to enable that API? I understand your resentments and that thus API would put pressure on the geonode-mapstore-client application and, therefore, make it harder to completely decouple GeoNode core from its UI components. On the other hand you could use the geonode-mapstore-client django app as example to eg build a geonode-vue-client app, right?

ridoo commented 1 year ago

What is the worst case in using the geonode-mapstore-client django app to enable that API? I understand your resentments and that thus API would put pressure on the geonode-mapstore-client application and, therefore, make it harder to completely decouple GeoNode core from its UI components. On the other hand you could use the geonode-mapstore-client django app as example to eg build a geonode-vue-client app, right?

I would not speak from "worst case" here. As said, my impression was there will be two separate things in the end (of a path I cannot oversee currently):

having a headless GeoNode and
a pure frontend client which is decoupled from the backend

At the moment the geonode-mapstore-client is a django module shipping client side code. To be honest, I did not dig deep enough into it to reason about why this is actually necessary. Ok, there is some route mapping, but also (which led me think the current status is an intermediary solution) still a lot of Django templates which then include the JS.

However, I cannot rate about decisions, in particular because I can't oversee the big picture. To me, however, a completely decoupled client makes a good testing candidate for the headless backend: "Is it possible to create a client based on the REST API available". Of course this may make it harder to realize things, especially with regard to the issue we actually try to discuss here. In any sense, it puts focus on alternate, but generic solutions, i.e. in this case separating "what" (backend) from "how" (frontend).

This however, leaves me with the question, why the client not just uses the metadata to render forms and widgets? Actually, the data is described already by its schema (more or less), isn't it? Perhaps, I do miss something really important here :see_no_evil:

EDIT: Keep in mind, I do not bring such a big Django dev history.

ridoo commented 1 year ago

@giohappy I forgot to reply on this one

You already have an OpenAPI description available. Did you accounted this already, and why not to extend metadata this way? Even JSON schema might be an option (but I have to admit not to thought about this option deeper, yet).

Here we're talking of something different. We want a model to describe forms, I don't see how this can be covered by the current API.

What I mean is this: there is an OpenAPI description available (which takes quite long for whatever reason) .. however, dependent how this gets generated you may think of a client which takes those metadata descriptions to render widgets, right? For example a dataset is described here:

"Dataset": {
  "type": "object",
  "description": "DREST-compatible model-based serializer.",
  "properties": {
    "pk": {
      "type": "string",
      "readOnly": true
    },
    "uuid": {
      "type": "string",
      "readOnly": true
    },
    "name": {
      "type": "string",

  ...
}

Of course, the schema needs more love and would have to be improved here. But a proper OpenAPI schema would be a benefit for more use cases than a custom API within the geonode-mapstore-client IMHO.

There are also libraries which support you rendering things from json schema (combined with templating).

kilichenko-pixida commented 1 year ago

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

It seems to me rendering the component in React from the the configuration it gets from the mapstore-client isn't going to help with the coupling with the core - instead it would either mean maintataining the correspondance between the GeoNode core metadata models and the mapstore-client configurations for React rendering or alternatively providing at least some endpoint in the GeoNode API that would expose the necessary information on the metadata model (which is then used in the mapstore-client to make a configuration palatable for React).

The former apporoach seems like a recipe for errors when metadata models get updated or extended and in the later case why not just expose the metadata schema in a way that would be directly easily palatable and sufficient for rendering?

I naively think it could be done in a way that it's not just hard-coded JSONs but is derived from the models (perhaps also reusing the code for serialization?). And it seems like both a custom API like @giohappy suggested and an approach with a standardized schema description @ridoo mentions would work, just with different trade-offs. But I might be just making many wrong assumtions, please correct me if I am thinking in the wrong direction.

Also, I am not sure I understand the general direction here, please help me out. Is Django app meant to stay within the mapstore client? What functions will it perform that are currently done by Django in the GeoNode core? What is its current scope of responsibilities?

kilichenko-pixida commented 1 year ago

And when I think about my latest experiences with the widgets and forms in geonode I guess it's going to be very hard to implement all the aspects of this into a simple JSON model. I still like the idea and approach to make it more easy to edit the metadata editor view.

@mwallschlaeger what do you think are going to be the the hard aspects? Are there particular parts of the metadata models you're worried about?

gannebamm commented 11 months ago

@kilichenko-pixida will create a simple POC using react-jsonschema-form to test its capabilities and how it could interact with the OpenAPI schema. This should help to follow up on this long discussion thread and have some implementation tests to criticise (good or bad).

kilichenko-pixida commented 9 months ago

@giohappy As @gannebamm mentioned above, we've worked on a POC for the approach @ridoo suggested.

To recap, the idea is to use an existing React library to render the form to edit the metadata based on the schema, and, as much as possible, to reuse the automatically generated OpenAPI schema we already have. Here is the stand-alone React draft implementation.

The key relevant findings so far:

The suggested library appears to be suitable for the purpose and could be extended as needed. Custom widgets for editing more complex fields are possible and the appearance is fully customizable with CSS.
OpenAPI schema structure is such, that theoretically, it can provide all the necessary info to the React client to render the form.
Static helper strings and encoded options are necessary for the form generation. They are currently not in the schema, but in principle, the schema structure has the properties to support them. Whether and how they should be integrated into the schema is up for discussion.
The current state of the OpenAPI schema is insufficient for the purpose. With my limited background, it's hard to say how much effort is needed to fix it.

The problems with the currently generated schema are manifold: 4.1. The generation is slow - it takes around 5s in the dev environment. This is probably solvable with some form of caching but it hasn't been tried yet. 4.2. The drf_spectacular library used for the introspection into the model and generating the schema seems to currently be poorly integrated. It produces multiple runtime errors (see the log in the linked repo) and as a result, many fields are missing from the schema or are represented incorrectly. It can be fixed manually, but the amount of both upfront and maintenance effort required might defeat the purpose of relying on automatic schema generation. 4.3. There is a bidirectional mismatch between what is marked as readOnly in the API (in the serializers and therefore in the schema) and what is actually editable through the advanced metadata editor.

For a more comprehensive overview of the findings, including further considerations about the schema, please see a prior lengthier post on the matter.

Given the points above, at least several key questions come up: A) What is the general direction of the OpenAPI schema development? Is it meant to be looked into any time soon? What purpose should it eventually serve?
B) Is the poor integration of the drf_spectacular something that's already known about? Are we missing something important about it?

ridoo commented 9 months ago

Thanks @kilichenko-pixida for summarizing your findings.

I just wanted to emphasize why we evaluated this path:

A good OpenAPI schema gives a structured view how an API is structured. Clients can "understand" how to use it, and how the actual model/payload looks like
Using this schema as kind of a contract, a client could use it to generate widgets from it.
Furthermore, it is able to generate payload matching the schema, which in turn is acceptable API payload
Using the introspection API might also take into account customized metadata (more or less automatically)

Other advantages on that approach would hopefully:

Further improve the API (which might be necessary anyway, when moving away from the Django templates)
Add machine readable description of the exposed GeoNode API
A map-store-client not needing to add another API for internal use (which have to be maintained as well)
Smoothen the path for alternate clients to act upon the GeoNode API

giohappy commented 9 months ago

It took some time to read everything and connect the many comments :)

In general, I agree with the approach of basing the form client component / app (whatever it is) on the OpenAPI schema. Theoretically, it's the best and cleanest approach, although with a list of shortcomings that I will list later. Problems are:

as you also noticed, the current introspection solution based on drf_spectacular is (very) far from perfect. It's been introduced as a quick and dirty thing by @afabiani in the middle of the refactoring to GN 4, but it's never been part of a dedicated development task. It was something to start, but that wasn't developed further. I've spent some time on it in the past, and I haven't found easy solutions to make it generate a correct and complete schema without heavy manual work. I think one of the problems is the dynamic rest framework, which adds quite a lot of complexity to DRF itself.
there's no plan for the OpenAPI at the moment on our side

So, yes, basing the forms on the OpenAPI would be nice, but in reality, we don't have it now. This doesn't mean we are against its development. I think we SHOULD have a correct and extensive OpenAPI for GeoNode

Basing the form only on the OpenAPI has its limitations by the way.

there are many information that need to be adapted to the current user, project, etc. (labeling, validation, permission on fields, etc.). Generating the OpenAPI on the fly is heavy, as you already noticed. Probably we can use faster tools, but I suspect it will always be heavy. Most of the APIs generate static schemas, also for this reason.
not all the "metainformation" for a form can be easily described with an OpenAPI schema (grouping of fields, for example?)

S, we don't have a properly generated OpenAPI yet, and we're not sure that we can express with it everything that is needed for a complex form.

This is why, in the end, we have considered to have a dedicated API. Yes, there will be some duplication, and we must take care of keeping this API aligned with the resource models, but this gives us complete freedom and flexibility to arrange it for the specific purposes of a form, also in terms of performance and dynamicity.

If any of you have a solution for the OpenAPI approach, and a concrete implementation to support the form requirements, I'm fully open to it!

giohappy commented 4 months ago

FYI we're moving this activity forward. Soon we will submit a tech proposal for the refactoring of the metadata services and frontend (editor).

We are keeping into account the proposal to make the metadata editor leverage on the OpenAPI and/or the JSON Schema specifications, but we're finding cases that fall outside the scope of these schemas, that would require custom schemas and protocols in any case. For example, the most important thing that isn't covered well (at least from my findings) is telling how to retrieve dynamic values / enums for selectors. For example, how do you tell the client where and how to retrieve the list of users for selecting owner, poc, etc.?

We don't want to fight against the specification or have to build complex schemas to circumvent any limitations. For this reason, we'll push forward the idea of using an API dedicated to the description of the metadata schema, along with a new API for their management (CRUD ops).

Another pillar for refactoring is building the metadata management API on top of multiple (configurable) model fields (concert or virtual). We want to detach the metadata structure from the Django models and the DB and let any model contribute its fields to the metadata, even extending them for custom purposes.

The goal is to reduce gain flexibility and reduce the number of places where customizations are required to implement custom metadata models (at the moment it means integrating your own models with the Django models through signaling, override the many metadata templates, etc.).

And finally, this goes hand in hand with the upgrade to PyCSW 3, which will probably be part of this activity.

gannebamm commented 4 months ago

Hi @giohappy Nice to see there is some ongoing work here! We had some discussions about the whole concept of metadata in GeoNode. One of the major issues we discovered is that metadata entries for publications (in terms of research) are static and must be static by design. The dynamic approach of GeoNode which saves user-objects to contacts is therefore against that principle. Here is an example to make this more clear:

Scientist sci-A works currently for organisation unit org-A. If sci-A uploads dataset data-A he/she will be listed as owner with his/her other contact infos like affiliation to org-A. Later sci-A switches to a new organisation unit like org-B. Due to the dynamic nature of GeoNodes metadata it will result in changed metadata if retrieved in the future for data-A.

Even though this is the prefered way to e.g. contact sci-A, this shall not happen from our point of view. It would for example change the list of datasets from org-A even though data-A was created while sci-A was part of it.

I hope this makes our issue clear?

We are aware of the edge-case-iness from our research institutes point of view. We are also aware that our edge-case will not change how the broader community will develop this feature and may think about other ways to solve our issue. One briefly drafted idea was to use the new #12124 to save static metadata files per dataset.

@mwallschlaeger what about ZALF? Have you also discussed this issue?

giohappy commented 4 months ago

Hi @gannebamm. I understand your point, The dynamic nature of many metadata fields, and their relations with other model fields, works well in many cases, but far less in others. Yours is an example, but we also have clients that were forced to use the "Metadata upload preserver" option to keep the original uploaded metadata file, and to avoid it being regenerated by GeoNode.

I don't think there's a one-fits-all solution. There could be more advanced options to opt-out from dynamic fields, have metadata versioning, and so on, but all this goes beyond the plans for this refactoring.

giohappy commented 3 months ago

We have started investigating the various options, including react-jsonschema-form. We have also looked at the PoC from @kilichenko-pixida.

This is our current plan:

Investigate which technology to use for the new metadata API. drf-spectacular (which we use in GeoNode) should already support generating JSON Schema. We need to check how far we can get with it. Another option is adopting Django Ninja for the metadata API but I don't know if it is worth it.
Investigate if the UI Schema from react-jsonschema-form covers all our needs, including the most specific one: telling the client where and how to retrieve the values for a field (external or internal API endpoint)
What approach should be adopted to generate and provide the UI schema to the client? It should be generated dynamically from fields, rather than being a static configuration. This is more or less the same as having a custom metadata editor API @ridoo, but adopting an already defined schema (UI Schema)
How to implement the editor. A pure React app would complicate things a lot, because we need to integrate security, CRSF tokens, etc. We will need to depend on the GeoNode MapStore client
For extensibility, we envision (for the future) exposing an API to let external code provide custom widgets and fields. This is just an idea, but we have no plan to implement it for now

We will keep you updated about our experiments....