agiorguk / gemini

Resources relating to the UK Gemini metadata profile
5 stars 3 forks source link

Current GEMINI encoding of dataset language and metadata language is not valid in INSPIRE #68

Open Sgaff opened 2 years ago

Sgaff commented 2 years ago

Hi,

The current guidance on the GEMINI pages for metadata language and for dataset language states that the codelist string that users should quote for the ISO language codes is

http://www.loc.gov/standards/iso639-2/php/code_list.php

However, if you attempt to run a full XML file through the INSPIRE validator with this encoding in it, it fails on the language element. After some playing around, and looking in inspire-tg-metadata-sio19139-2.0.1.pdf, I identified the problem INSPIRE had as being the presence of the /php/code_list.php part of the string.

I re-built my XML so the language portion was as follows

English and the INSPIRE validator accepted this without issue. I propose that we change our guidance on the website to reflect this subtle difference and wonder does this mean that we need to change our Schematron checks? Cheers Sean
nmtoken commented 2 years ago

Schematron only checks there is a 3 letter language code in the codeListValue; no check of codelist URI.

Sgaff commented 2 years ago

That's good then from the point of view of the change, as it would purely be edits in the website.

nmtoken commented 2 years ago

If you go to page http://www.loc.gov/standards/iso639-2/ you can see that there is a link to ISO 639-2 Code List from it, so http://www.loc.gov/standards/iso639-2/ is not a link to the code list, and if the INSPIRE validator expects it, then that must surely be an error.

Valid URLs to the code list are:

A link to the code eng in the code list is:

https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?code_ID=130

Basically think that INSPIRE validator is wrong for rejecting URLs with php in them

Sgaff commented 2 years ago

Can we feed this back to INSPIRE then so they can do a corrigendum? As the https://www.loc.gov/standards/iso639-2/ is the example encoding in the TG and that would need to be changed as well.

Sean

Sgaff commented 2 years ago

I'll also take view, based on James' comments, that the GEMINI interpretation is correct and will leave it this way for imminent MEDIN release.

PeterParslow commented 2 years ago

@Sgaff : could you raise it as a new issue against the INSPIRE TG, at https://github.com/INSPIRE-MIF/technical-guidelines/issues? Or if it's more an issue with their validator than their text, then raise it at: https://github.com/INSPIRE-MIF/helpdesk-validator

And then close it here.

PeterParslow commented 2 years ago

Related issue / pull request at INSPIRE MIF: https://github.com/inspire-eu-validation/metadata/pull/175.

Note: this the validator sticking making the current implementation more tolerant, but not taking into account James' view here that they should be targeting something that returns a value.