GeoNode / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
https://geonode.org/
Other
1.45k stars 1.13k forks source link

GNIP 86 - metadata parsing and storing #7263

Closed etj closed 3 years ago

etj commented 3 years ago

GNIP 86 - metadata parsing and storing

Overview

Metadata parsing is not easily customizable.

The proposed changes allows to

Other tasks included:

Proposed By

@etj (Emanuele Tajariol)

Assigned to Release

This proposal is for GeoNode .

State

Motivation

I want to be able to provide one or more metadata parsers, in order to be able to override a mapping implemented by the default parser. Examples:

In this latter case (model extension), we may want to save the extra data parsed. That's why we also need a way to call some logic that knows how to deal with the custom parsed data. In case the customization has extended the base model, the parser will put the added values into the val dict and the default logic will save the new fields along with the "official" ones. In case the customization added a 1:1 relationship to another table, we'll provide all of the parsed values to the custom logic. We need to explicitly call the custom logic to deal with other db objects, and can not rely on signals because in this latter case there is no way we can provide the parsed data.

Proposal

Metadata parsing

In settings, a new variable METADATA_PARSERS will be added. It's a dict, having as keys MD_Metadata, metadata, Record -- these values are taken from metadata.py, related to the root element names of the related metadata; Anyway, since this proposed implementation makes it dynamic, you will be able to define a parser for a brand new metadata format only by implementing the function and declaring it in this setting. The value related to a key in the dict will be a list containing:

Parser functions must be implemented so that they will return a 5 element tuple:

The parser function should take as params:

The parser function can alter (refine / improve) the content of each one of the params, and then return them back.

In meta-code the defined functions should be called like that (excluding error checking and default assignments):

parsers = config['METADATA_PARSERS'][root_el]
uuid=none
vals={}
regions=[]
keywords=[]
custom={}
for f in parsers:
   uuid, vals, regions, keywords, custom = f(exml, uuid, vals, regions, keywords, custom)

Current parsing is called for instance in https://github.com/GeoNode/geonode/blob/3.1/geonode/upload/upload.py#L845

layer_uuid, vals, regions, keywords = set_metadata( ...xml file ...)

Storing

At the end of final_step(), we'll be providing the layer and all the parsed info to any function defined in settings.

storer_list = config['METADATA_STORERS']
for s in storer_list:
   s(layer, uuid, vals, regions, keywords, custom)

As an example, we may have a parser which extracts al "process steps" from the metadata, and store them into custom['processes']. A storer function will then use the Layer info and the custom['processes'] info to create new DB records referencing to layer with a foreign key and other text fields holding the process steps details.

As an alternative to have so many params in the Layer storer functions, they may require only layer and custom parameters, since all the other ones have already been stored in the Layer instance.

Sub-issues

When considering this GNIP, other topics were involved, which have been moved as standalone issues:

Backwards Compatibility

The logic will not be implemented for the synch uploader (i.e.

UPLOADER = {
    'BACKEND': os.getenv('DEFAULT_BACKEND_UPLOADER', 'geonode.rest'),

since it's going to be deprecated.

Future evolution

Explain which could be future evolutions.

Feedback

Update this section with relevant feedbacks, if any.

Voting

Project Steering Committee:

Links

Remove unused links below.

gannebamm commented 3 years ago

@stefmec Please take a look at this proposal and check if it would solve some of our current issues with metadata ingestion.

afabiani commented 3 years ago

+1