geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
410 stars 486 forks source link

INSPIRE / Add support to load codelists from Registry #2500

Closed fxprunayre closed 6 years ago

fxprunayre commented 6 years ago

image

A form is displayed with:

image

image

When submitting the form, the registry items are processed by the catalogue.

Processing to build a thesaurus from the codelist :

image

Dev branch: https://github.com/titellus/core-geonetwork/tree/feature/inspire-registry

Tests made with:

image

df-git commented 6 years ago

You can find my answers inline:

fxprunayre commented 6 years ago

Thanks @df-git for the recommendations.

fxprunayre commented 6 years ago

some information (like the parent / child relation) which is available in the Re3gistry XML

@df-git, Could you point us to a codelist using this type of relation in order to add support to it. Thanks.

df-git commented 6 years ago

This is an example: http://inspire.ec.europa.eu/metadata-codelist/PriorityDataset If you open the Re3gistry XML you can see the hierarchy. e.g.

...
<parents>
  <parent id="http://inspire.ec.europa.eu/metadata-codelist/PriorityDataset/dir-2002-49">
    <label xml:lang="en">Directive 2002/49/EC</label>
  </parent>
</parents>
...
fxprunayre commented 6 years ago

Thanks @df-git . Is there any elements which could help identifying top concepts ?

Another question, I suppose we should only concentrate on items with status http://inspire.ec.europa.eu/registry/status/valid ? And ignore others eg. Superseded

df-git commented 6 years ago

The top concepts have no parents. There isn't a specific element to identify them.

Yes, the ones to be considered are the "Valid" elements.

fxprunayre commented 6 years ago

Some new questions @df-git

fxprunayre commented 6 years ago

Also an "issue" related to valid item is that each Registry installation has its own concept of valid status eg. http://registre.geocatalogue.fr/codelist/drillingmethod/drillingmethod.en.xml

<status id="http://registre.geocatalogue.fr/registry/status/valid">
<label xml:lang="en">Valid</label>

The valid status id depends on the registry base url.

df-git commented 6 years ago
fxprunayre commented 5 years ago

I noticed an issue related to the codelist date. For example, the INSPIRE themes were published the 2008-06-01 eg. https://github.com/geonetwork/util-gemet/blob/master/thesauri/inspire-theme.rdf#L13

But when creating the thesaurus from the registry, we are currently setting the date to "now".

Using the validator, we end up with

The content "2018-05-25" of element <DateOfPublication> does not match the required simple type. Value "2018-05-25" contravenes the enumeration facet "2008-06-01" of the type of element DateOfPublication at column 66, line 10

@df-git, is there any element indicating date of publication in Registry responses ? eg. http://inspire.ec.europa.eu/theme/theme.en.xml