INSPIRE-MIF / helpdesk

Community discussion for generic INSPIRE related topics
6 stars 5 forks source link

Problem with community defined code list (gmd:organisationName) #29

Closed enricoboldrini closed 2 years ago

enricoboldrini commented 4 years ago

Error returned: Each contact information in a metadata record must have a non-empty organisationName. The metadata record does not fulfill this requirement.

I'm validating a SeaDataNet XML document, that is a community profile of ISO 19115 metadata (https://www.seadatanet.org/Standards/Metadata-formats/CDI), drafted according to official ISO 19115 rules. Most likely, the test expects gmd:organisationName instead of the codelist sdn:SDN_EDMOCode that is a community defined code list.

Test Report: http://inspire.ec.europa.eu/validator/v2/TestRuns/EID78510388-5466-4487-adce-6e460b23fe57.html Assertion URI: http://inspire.ec.europa.eu/validator/v2/TestRuns/EID78510388-5466-4487-adce-6e460b23fe57.html?lang=en#EIDf2b9ad7f-995d-4b1c-91b2-0645f3ea78e0 Referenced file: https://www.seadatanet.org/content/download/4534/file/CDI_ISO19139_full_example_12.2.0.xml

danielnavarrogeo commented 4 years ago

Dear @enricoboldrini

According to TG Requirement C.1, the metadata shall be valid against at least one of a predefined set of schemas.

imagen

Your metadata does not pass the schema validation because you are using the schema https://schemas.seadatanet.org/Standards-Software/Metadata-formats/SDN_CDI_ISO19139_12.2.0.xsd which is not one of the valid schemas.

Regards

enricoboldrini commented 4 years ago

Dear @danielnavarrogeo,

as I understand from the note just below the above screenshot, it is possible for profiles to define new elements:

NOTE These guidelines extensively use XPath expressions in the requirements and recommendations. If profiles conformant to [ISO 19139] are being used to encode INSPIRE metadata records, these XPath expressions may need to be adapted to match the profile.

The profile new elements will be defined in a community schema which in turn will import the official ISO schema, like the one I'm using in the document being tested.

Please let me know if I'm missing something, thank you in advance, Enrico

enricoboldrini commented 4 years ago

Dear @danielnavarrogeo ,

do you have any updates on this? Are community profiles currently allowed by MD TG 2.0? Those profiles will be defined by community schemas, importing the official ISO schemas.

Thank you in advance for any info on the status of this, Enrico

danielnavarrogeo commented 4 years ago

Dear @enricoboldrini

We are discussing it internally. We will keep you posted.

Regards

enricoboldrini commented 4 years ago

Nice, thank you @danielnavarrogeo Enrico

fabiovin commented 4 years ago

Dear @enricoboldrini,

even if MD TG 2.0 would validate the metadata against the schema declared in the schemaLocation attribute (i.e. your community profile), your metadata does not pass the validation because the TG Requirement C.6 states that:

The name of the responsible organisation shall be provided as the value of gmd:organisationName element with a Non-empty Free Text Element content.

Regarding this, the TG Requirement C.4 states how free text elements of type gco:CharacterString_PropertyType shall be expressed. According to these requirements, the encoding of the element gmd:organisationName in your metadata is not correct.

IMHO, a metadata profile should not change the dataType of an element, in order to maintain the compliance to the original schemas. Your "profile" looks more like an "extension" because it changes the dataType of some elements.

You could use the gmx:Anchor child element to encode this type of elements for which you want to define a codelist.

So, for instance, the element gmd:organisationName, encoded in your metadata as:

         <gmd:organisationName>
            <sdn:SDN_EDMOCode
               codeList="https://edmo.seadatanet.org/isocodelists/edmo-edmerp-codelists.xml#SDN_EDMOCode"
               codeSpace="SeaDataNet" codeListValue="1">University of Birmingham, Department of
               Geological Sciences</sdn:SDN_EDMOCode>
         </gmd:organisationName>

could be encoded in this way:

         <gmd:organisationName>
            <gmx:Anchor xlink:href="https://edmo.seadatanet.org/isocodelists/edmo-edmerp-codelists.xml#SDN_EDMOCode_1">University of Birmingham, Department of Geological Sciences</gmx:Anchor>
         </gmd:organisationName>

Using this encoding your metadata will maintain compliance with the original ISO schemas (See validation report of the modified xml: https://inspire.ec.europa.eu/validator/v2/TestRuns/EID2973bb95-525e-4087-9d6c-b17d0cd88e6d.html).

Fabio

enricoboldrini commented 4 years ago

Dear Fabio,

thank you very much for the info. Actually, as you surely know, ISO 19115 provides some well defined rules for metadata extension that community profiles must follow, you can find them in the Annex F. In particular, SeaDataNet profile mostly makes use of rule F.4 (Definition of a new metadata codelist), to restrict a "free text" domain into a set of terms. In this case the free text domain of the element "organisation name" is replaced with a code list containing a fixed set of marine organisation names. The new element is defined in a schema importing official ISO 19139 schema and a metadata codelist catalog is published online according to ISO 19139 section 8.5.5 (CodeList encodings). So, yes, it is possible to change the dataType of an element, if done strictly accordingly with ISO 19115 methodology (moreover here we are restricting the domain: from free text to a list of terms).

To support free text domain restriction the XPath in the test could be changed from: gmd:pointOfContact/gmd:CI_ResponsibleParty/gmd:organisationName/gco:CharacterString to something like: gmd:pointOfContact/gmd:CI_ResponsibleParty/gmd:organisationName/*[1]/text()

In case you might need to also check the definition of the element, it will be sufficient to check that it is in a substitution Group with gco:CharacterString (just like gmx:Anchor). But probably this further check it is out of scope and it can be safely skipped.

You will agree that restricting the free text with a ISO 19115 code list is a more powerful mechanism than using a gmx:anchor (providing only a link to a web resource). SeaDataNet considered anchors during the drafting of the community profile, but opted to make full use of ISO 19115 code lists, to the benefit of internal use, but also to improve interoperability with international communities that are full compliant with ISO 19115.

Polymorhphism, as you surely know as well, is extensively considered in ISO 19139, I'm reporting only a brief excerpt that implies it is just about the norm for user community profiles:

It is even possible to have an XML file containing a metadata set without containing a single MD_Metadata XCGE element. This is a consequence of the polymorphism, which may imply that an XCGE of a MD_Metadata subclass, potentially defined in a user community profile, occurs instead of the MD_Metadata XCGE element.

So, I really hope that MD TG 2.0 test suite will implement support for polymorphism soon, as it's a key feature already considered in the MD TG 2.0 spec and used by almost all community profiles of ISO 19115.

Moreover the support for community profiles was previously successfully achieved in MD TG 1.3 test suite. In the last 2 years CNR-IIA and SeaDataNet community have actively contributed to enable it reporting the following issues until they all get solved by INSPIRE team with success: https://github.com/inspire-eu-validation/community/issues/134 https://github.com/inspire-eu-validation/community/issues/45 https://github.com/inspire-eu-validation/ets-repository/issues/183 https://github.com/inspire-eu-validation/ets-repository/issues/184 https://github.com/inspire-eu-validation/ets-repository/issues/185

I'm available to continue to support the implementation by reporting such community profile related issues also for MD TG 2.0, understanding that it may take some time to modify the tests. Please let me know if you agree on this path, thank you very much in advance!

Kind regards, Enrico

enricoboldrini commented 4 years ago

Dear @fabiovin , @danielnavarrogeo ,

any news on this issue? Is it expected that community profiles having ISO 19115 compliant extensions will be supported by the MD TG 2.0 validator?

Thank you very much on any hint on this.

Enrico

AntoRot commented 4 years ago

I agree that the metadata record tested, including metadata elements using code lists instead of free text, should pass the validation for the reasons explained by @enricoboldrini (i.e. the note below the TG requirement C.1, the conformance to the extensions rules of ISO 19115 and TS 19139, the substitutionGroup attribute equal to the "gco:CharacterString" for the code lists like for the Anchor element). The solution is provided by Enrico, i.e. taking into account the XPath gmd:pointOfContact/gmd:CI_ResponsibleParty/gmd:organisationName/*[1]/text(). Obviously, unlike what is indicated for the encoding of the code list values in the section 2.1.1 of the TG (i.e. that the textual content of the metadata element is purely informative), in this case the textual content shall not be empty. Additionally, this update should be applied to all metadata elements encoded as free text. Consequently, this will imply a revision of the TG requirement C.1.

With reference to the issue INSPIRE-MIF/helpdesk-validator#298, I took a look to the SDN community profile where the class sdn:SDN_DataIdentification (sub-class of MD_DataIdentification) is justified by the need of adding the new metadata element additionalDocumentation. I was wondering if the ISO metadata element gmd:otherCitationDetails could be suitable instead of adding that new metadata element. If not (I'm sure that the check if that ISO metadata element met the community requirements has been done), then the solution for the validator is provided by the conformance requirements for the extensions in point 4) of the section A.3 of ISO TS 19139, i.e. "looking for either ISO/TS 19139 elements or elements whose isoType attribute contains an ISO class name" (i.e. looking for either an instance of MD_DataIdentification or a sub-class with the isoType attribute equal to the name of that ISO class like in the case of SDN profile).

iuriemaxim commented 4 years ago

I disagree with both @AntoRot and @enricoboldrini and I agrre with @fabiovin, as the INSPIRE TGs are imposing different requirements than ISO. In ISO the Organisation name is optional, while in INSPIRE this element is mandatory. And there was a reason to have this element mandatory.

The INSPIRE validator should not be transformed into a tool that validates the Metadata against ISO. Tests should not be relaxed to such a degree that INSPIRE Requirements not to be fullfilled.

I noticed a lot of such requirements posted by @enricoboldrini and endorsed by @AntoRot and I am afraid that already the INSPIRE validator is not taking into account the TGs.

Sometimes INSPIRE is more demanding, sometimes ISO is more demanding. The image below is from Metadata TG 1.3:

image

If there are aspects that should be modified, then the first step is to be modified in the TGs. Validator should follow the TGs and not the ISO, I am suggesting to mark all these as discussion and to analyse all the issues that were already closed in order to see if the TGs Requirements are implemented in the validator or the tests were relaxed with no reasoin:

Can be marked as discussion: INSPIRE-MIF/helpdesk-validator#296 INSPIRE-MIF/helpdesk-validator#297 INSPIRE-MIF/helpdesk-validator#298

To be verified if they are against the TGs: INSPIRE-MIF/helpdesk-validator#134 INSPIRE-MIF/helpdesk-validator#45 https://github.com/inspire-eu-validation/ets-repository/issues/183 inspire-eu-validation/ets-repository#184 inspire-eu-validation/ets-repository#185

enricoboldrini commented 4 years ago

Dear @iuriemaxim , probably there is a misunderstanding. "Organisation name" is mandatory for INSPIRE and nobody has asked to make it optional during validation. The fact is that some ISO profiles oblige their users to document "organisation name" from a list of fixed texts (called code list in ISO), so these profiles are even more demanding than INSPIRE (not less!). The current validator doesn't recognize community code lists and profiles in general, altough TG report the following at page 7: NOTE These guidelines extensively use XPath expressions in the requirements and recommendations. If profiles conformant to [ISO 19139] are being used to encode INSPIRE metadata records, these XPath expressions may need to be adapted to match the profile. So, in my opinion no need to change TG in this case, just modify the XPath in the validator. If not doing so, many documents that are valid by law for the directive (as they surely have "organisation name" inside!) will be flagged as not valid by the validator... this seems not good in first place for the validator itself. I reiterate my possible support and collaboration in trying to solve the issue.

iuriemaxim commented 4 years ago

@enricoboldrini Thank you for clarifying that there is a misunderstanding. The comment is a general one as I saw that there were a lot of requirements made in order to change the INSPIRE validator and usually they are asking for more relaxed tests to allow to pass a certain metadata profile. And some, if not all your raised issues, are actually in this sense.

I think that the role of INSPIRE is to have harmonised data and metadata, not one user to encode the some elements in data or metadata in a way, another one in another way and a third one in another way. If doing so, interoperability will not be ensured. So I am not in favor of these changes.

I just think how an European Portal would be able to cope with metadata written in so many ways and how a user will be able to find some data based on the metadata. If organisation name or any other element would be allowed to be encoded in many ways, it would be a huge effort for a development team to store all those metadata files in a database and to index the data based on various fields. Currently organisations are indexed in the INSPIRE Geoportal so an user could easily find all metadata from a certain organisation. But if the required modification will be allowed, most probably it will not be possible to index this field or will be done with a significant effort from the development team.

Proxy browser can be accesed at https://inspire-geoportal.ec.europa.eu/proxybrowser and instantly a user can see how many resources were provided by a certain organisation and can filter in order to retrieve those resources. If this proposed change will be done most probably this will not be working anymore or the programmers should struggle a lot to interpret each custom metadata profile

image

Also parsing the information from the custom metadata profile to be able to show the metadata in a human readable format will be also a challenge

image

If thinks can be simple and as @fabiovin already explained how the metadata files should be modified, why not to keep this simple and standard and why to complicate the things by allowing any kind of data profile a user would ask. There are lot of ETL tools to change both data and metadata and this is actually what INSPIRE requires.

Even changes in the Geoportal to be able to swich from metadata version 1.3 to metadata version 2.0 takes much more than expected and I am almost sure that it is due to similar reason> any change in the metadata profile involve a lot of programming.

For data providers is much more easier to transform their data and metadata to agreed formats, rather than to load data and metadata in so many formats and structures and to expect IT systems to harmonize them.

enricoboldrini commented 4 years ago

Dear @iuriemaxim , I understand your point of view, however hoping that base ISO 19115 will be sufficient for all the communities isn't a reasonable position.. otherwise ISO profiles wouldn't have existed in first place (profiles are a very important part of ISO 19115 and are described with great effort there). Cutting off ISO profiles the INSPIRE validator is missing big sources of metadata that are in general rich in content and also compliant with INSPIRE directive. Moreover, ISO puts strict rules to create profiles, so that is easy for a validator to recognize extensions (e.g. the isoType attribute mechanism or the ISO code lists mechanism... sorry, but profile creators can't actually invent so much while making a ISO profile). So, both for indexing and displaying I honestly don't see major efforts (just need to change some XPath, according to what it is also written and recommended in INSPIRE Technical Guidelines at page 7.

iuriemaxim commented 4 years ago

@enricoboldrini I just hope that the is quite clear that INSPIRE TG for metadata sets different rules than ISO 19115. It is clearly written in the post above, where in some cases ISO 19115 is more demanding, in other cases INSPIRE is more demanding. ISO 19115 was the base for TG Metadata version 1.3 that is not anymore in place from 19 December 2019 after there years of transition period.

The current TG for INSPIRE Metadata in version 2.0 is based on ISO/TS 19139, but similar as for version 1.3, sometimes INSPIRE is more demanding, sometimes ISO is more demanding. Please consult at least the Foreword section of the Metadata TG version 2.0 at: https://inspire.ec.europa.eu/id/document/tg/metadata-iso19139

Some print screens from the TG:

image

image

image

image

Regarding the indication of page 7 in the TG, I do not see anything mentioned about community defined profiles.

If the metadata is not compliant to INSPIRE, are the datasets served by view and download services compliant to INSPIRE TGs? Are they made according to specifications of the INSPIRE Data themes? If so, than the principles that the data and metadata should be changed trough ETL are understood. We are also providing datasets and metadata for different organisations based on their requirements. For World Meteorological Organisation (WMO) the data is required based on a data model. In INSPIRE it is required based on another data model. We are nor asking neither WMO, neither INSPIRE to change their standards in order to provide the data in only one format as we understand that each organisation has different needs.

Hope it helps to understand why the proposed requirements are not inline with INSPIRE and why the metadata in that format should be transformed trough ETL. The INSPIRE Metadata Profile in Geonetwork can be used in order to see how the INSPIRE metadata version 2.0 could look like.

Best regards, Iurie Maxim

The idea on INSPIRE was to have a European standard

enricoboldrini commented 4 years ago

inspire-screen

Dear @iuriemaxim, probably the misunderstanding is because of page numbering... the foreword you reported is at page 7 of pdf, but is actually page VII. Please go to page 28 of pdf (page 7 of the document).

Kind regards, Enrico

iuriemaxim commented 4 years ago

I am reading the entire content.

image

I read the Note as follows: These guidelines ...., these XPath expressions may be need to be adapted to match the INSPIRE profile.

And I am reading the entire content of the guideline for a proper understanding.

image

As a general rule a requirement or a note will not be in contradiction with another requirement.

iuriemaxim commented 4 years ago

In connection with the topic, these issues can be consulted as well in conjunction with C.1 requirement, as APISO schema is still not repaired due to ISO changes:

https://github.com/inspire-eu-validation/community/issues/245 https://github.com/inspire-eu-validation/community/issues/246 https://github.com/opengeospatial/ets-19139/issues/20

These could help to understand the entire text of the section 2.1.

iuriemaxim commented 3 years ago

Just an update on this thread: APISO schema was fixed.

MarcoMinghini commented 3 years ago

Dear all, I move this issue from the Validator helpdesk to the general helpdesk, given that this is a more general discussion that is not only about validation.