define one or more custom parsers, in order to be able to handle metadata documents following specific profiles not handled by owslib
add hook to handle custom DB object on layer creation.
Other tasks included:
fix keyword parsing
Proposed By
@etj (Emanuele Tajariol)
Assigned to Release
This proposal is for GeoNode .
State
[x] Under Discussion
[x] In Progress
[x] Completed
[ ] Rejected
[ ] Deferred
Motivation
I want to be able to provide one or more metadata parsers, in order to be able to override a mapping implemented by the default parser.
Examples:
extend owslib XML mapping: there are ISO19115/19139 profiles that store thesaurus keywords not in gco:CharacterString, but into gmx:Anchor, and the current owslib parser is not able to parse them. We could of course fix owslib, but it would be a much longer process and it's not said that a particular profile is of general interest, so we need to fine tune the parsing on geonode itself.
fix XML mapping: ISO19115 constraints should be grouped at gmd:MD_LegalConstraints level, while owslib loops over gmd:MD_RestrictionCodes and gmd:otherConstraints separately.
extend GeoNode model: if in a GeoNode customization I need to store some more metadata info, I'll need a custom parser.
In this latter case (model extension), we may want to save the extra data parsed. That's why we also need a way to call some logic that knows how to deal with the custom parsed data.
In case the customization has extended the base model, the parser will put the added values into the val dict and the default logic will save the new fields along with the "official" ones.
In case the customization added a 1:1 relationship to another table, we'll provide all of the parsed values to the custom logic.
We need to explicitly call the custom logic to deal with other db objects, and can not rely on signals because in this latter case there is no way we can provide the parsed data.
Proposal
Metadata parsing
In settings, a new variable METADATA_PARSERS will be added.
It's a dict, having as keys MD_Metadata, metadata, Record -- these values are taken from metadata.py, related to the root element names of the related metadata; Anyway, since this proposed implementation makes it dynamic, you will be able to define a parser for a brand new metadata format only by implementing the function and declaring it in this setting.
The value related to a key in the dict will be a list containing:
"__DEFAULT__" a fixed string, to indicate the default parser (the existing one, if already defined for that type of metadata), only as optional first element
references to parser functions.
Parser functions must be implemented so that they will return a 5 element tuple:
uuid, as it is now
vals, as it is now, a dict holding ResourceBase fields
regions, as it is now
keywords, as it is now
custom, a dict that contains
key: an id related to the parser itself
val: a dict of parsed values
The parser function should take as params:
exml object (the input document)
vals, the vals produced in the previous step
regions, the regions produced in the previous step
keywords, the keywords produced in the previous step
custom, the custom produced in the previous step
The parser function can alter (refine / improve) the content of each one of the params, and then return them back.
In meta-code the defined functions should be called like that (excluding error checking and default assignments):
parsers = config['METADATA_PARSERS'][root_el]
uuid=none
vals={}
regions=[]
keywords=[]
custom={}
for f in parsers:
uuid, vals, regions, keywords, custom = f(exml, uuid, vals, regions, keywords, custom)
At the end of final_step(), we'll be providing the layer and all the parsed info to any function defined in settings.
storer_list = config['METADATA_STORERS']
for s in storer_list:
s(layer, uuid, vals, regions, keywords, custom)
As an example, we may have a parser which extracts al "process steps" from the metadata, and store them into custom['processes'].
A storer function will then use the Layer info and the custom['processes'] info to create new DB records referencing to layer with a foreign key and other text fields holding the process steps details.
As an alternative to have so many params in the Layer storer functions, they may require only layer and custom parameters, since all the other ones have already been stored in the Layer instance.
Sub-issues
When considering this GNIP, other topics were involved, which have been moved as standalone issues:
7279: Keyword parsing improvements
7288: Layer saving cleanup
Backwards Compatibility
The logic will not be implemented for the synch uploader (i.e.
GNIP 86 - metadata parsing and storing
Overview
Metadata parsing is not easily customizable.
The proposed changes allows to
Other tasks included:
Proposed By
@etj (Emanuele Tajariol)
Assigned to Release
This proposal is for GeoNode.
State
Motivation
I want to be able to provide one or more metadata parsers, in order to be able to override a mapping implemented by the default parser. Examples:
gco:CharacterString
, but intogmx:Anchor
, and the current owslib parser is not able to parse them. We could of course fix owslib, but it would be a much longer process and it's not said that a particular profile is of general interest, so we need to fine tune the parsing on geonode itself.gmd:MD_LegalConstraints
level, while owslib loops overgmd:MD_RestrictionCode
s andgmd:otherConstraints
separately.In this latter case (model extension), we may want to save the extra data parsed. That's why we also need a way to call some logic that knows how to deal with the custom parsed data. In case the customization has extended the base model, the parser will put the added values into the
val
dict and the default logic will save the new fields along with the "official" ones. In case the customization added a 1:1 relationship to another table, we'll provide all of the parsed values to the custom logic. We need to explicitly call the custom logic to deal with other db objects, and can not rely on signals because in this latter case there is no way we can provide the parsed data.Proposal
Metadata parsing
In settings, a new variable
METADATA_PARSERS
will be added. It's a dict, having as keysMD_Metadata
,metadata
,Record
-- these values are taken frommetadata.py
, related to the root element names of the related metadata; Anyway, since this proposed implementation makes it dynamic, you will be able to define a parser for a brand new metadata format only by implementing the function and declaring it in this setting. The value related to a key in the dict will be a list containing:__DEFAULT__
" a fixed string, to indicate the default parser (the existing one, if already defined for that type of metadata), only as optional first elementParser functions must be implemented so that they will return a 5 element tuple:
uuid
, as it is nowvals
, as it is now, a dict holdingResourceBase
fieldsregions
, as it is nowkeywords
, as it is nowcustom
, a dict that containsThe parser function should take as params:
The parser function can alter (refine / improve) the content of each one of the params, and then return them back.
In meta-code the defined functions should be called like that (excluding error checking and default assignments):
Current parsing is called for instance in https://github.com/GeoNode/geonode/blob/3.1/geonode/upload/upload.py#L845
Storing
At the end of
final_step()
, we'll be providing the layer and all the parsed info to any function defined in settings.As an example, we may have a parser which extracts al "process steps" from the metadata, and store them into
custom['processes']
. A storer function will then use theLayer
info and thecustom['processes']
info to create new DB records referencing to layer with a foreign key and other text fields holding the process steps details.As an alternative to have so many params in the Layer storer functions, they may require only
layer
andcustom
parameters, since all the other ones have already been stored in the Layer instance.Sub-issues
When considering this GNIP, other topics were involved, which have been moved as standalone issues:
7279: Keyword parsing improvements
7288: Layer saving cleanup
Backwards Compatibility
The logic will not be implemented for the synch uploader (i.e.
since it's going to be deprecated.
Future evolution
Explain which could be future evolutions.
Feedback
Update this section with relevant feedbacks, if any.
Voting
Project Steering Committee:
Links
Remove unused links below.