AGIuk / Schematron

The Schematron files to support GEMINI 2.3 validation
1 stars 4 forks source link

URN-encoded Coordinate Reference System fails validation #6

Closed archaeogeek closed 3 years ago

archaeogeek commented 3 years ago

A URN-coded CRS of the form in the attached xml (zipped) fails validation. It would appear that an additional rule in MI-17 (https://github.com/AGIGemini/Schematron/blob/master/GEMINI_2.3_Schematron_Schema-v1.0.sch#L419-L481) is required to address this.

epsg4258_urn.zip

nmtoken commented 3 years ago

Can you point out the rule(s) in the TG that you think aren't being checked to allow a urn.

For example

TG Requirement 2.1: metadata/2.0/req/isdss/crs

The coordinate reference system(s) used in the described data set or data set series shall be given using element gmd:referenceSystemInfo/gmd:MD_ReferenceSystem/gmd:referenceSystemIdentifier/gmd:RS_Identifier. The multiplicity of this element is 1..*.

So one must be given

TG Requirement 2.2: metadata/2.0/req/isdss/crs-id

If the coordinate reference system is listed in the table Default Coordinate Reference System Identifiers in Annex D.4, the value of the HTTP URI Identifier column shall be used as the value of gmd:referenceSystemInfo/gmd:MD_ReferenceSystem/ gmd:referenceSystemIdentifier/gmd:RS_Identifier/gmd:code element.

So if that one is in Annex D4, it must be an HTTP URI and not a URN.

archaeogeek commented 3 years ago

In all honesty, @PeterParslow asked me to submit this so I'll defer to his better knowledge!

nmtoken commented 3 years ago

Hmm, in fact that file doesn't fail on any Metadata Item 17 - Spatial Reference System rules...

The record passes the TG Requirement 2.1: metadata/2.0/req/isdss/crs rule (MI-17a)

and rules MI17b, c ,d don't test for TG Requirement 2.2: metadata/2.0/req/isdss/crs-id

So

<gmd:referenceSystemInfo>
      <gmd:MD_ReferenceSystem>
         <gmd:referenceSystemIdentifier>
            <gmd:RS_Identifier>
               <gmd:code>
                  <gco:CharacterString>urn:ogc:def:crs:EPSG::4258</gco:CharacterString>
               </gmd:code>
            </gmd:RS_Identifier>
         </gmd:referenceSystemIdentifier>
      </gmd:MD_ReferenceSystem>
</gmd:referenceSystemInfo>

should validate against the required schematron, but in the supplemental schematron you get a warning because urn:ogc:def:crs:EPSG::4258 is not an identifier for one of the default Coordinate reference systems (which is correct).

For ETRS89-GRS80 you should use http://www.opengis.net/def/crs/EPSG/0/4258

Now I'm wondering whether we need a stricter rule in the Metadata Item 17 - Spatial Reference System rules to fail such a record.

nmtoken commented 3 years ago

Incidentally the example file is not valid, fails on multiple ' Free text elements should not be empty'

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 19:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 557:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 567:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 577:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 587:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 597:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 699:0

System ID: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Main validation file: C:\DOWNLOADS\Op\epsg4258_urn\epsg4258_urn.xml
Scenario name: GEMINI23
Engine name: ISO Schematron
Severity: error
Description: AT-6: Free text elements should not be empty
Start location: 778:0
archaeogeek commented 3 years ago

It was never checked for validation against everything- just rigged up to demo a number of CRS scenarios for @PeterParslow . Try the attached one which has the empty elements removed epsg4258_urnv2.zip

There's a wider question in the other open issue that only the first CRS should be checked to see if it's one of the recommended ones, the others shouldn't fail (or even warn really) if the first one is OK.

PeterParslow commented 3 years ago

Here's what I said to Jo about this particular example (I haven't actually checked the uploaded one):

The URN encoding fails INSPIRE TG Requirement 2.2 - we need an extra rule in the MI-17 series. Would you like to raise an issue on GitHub?

What I'm suggesting is what James has come to: I think this needs a stricter MI-17. Perhaps that's "an extra rule" or perhaps a tweak to one of the current ones.

The message returned is also misleading, saying that EPSG::4258 isn't an INSPIRE default CRS, when actually its the encoding of that into the metadata that's wrong (if it's e.g. code & codespace, or URN)

nmtoken commented 3 years ago

Not such a fan on using first listed element, for a couple of reasons ~ it adds a constraint that isn't in the requirements, and it can make it more difficult for clients to build/manage the metadata.

In this case first (implying the rule applies to one) is also not correct, the rule should apply to all CRS that are listed in Annex D4

For the supplemental schematron warnings (SP-4a/SP-4b), would a better message be that EPSG::4258 is not a valid identifier for a default CRS as specified in Annex D4. Accepting, though, that if we have another rule in Metadata Item 17 - Spatial Reference System, the supplemental rule may not be required.

archaeogeek commented 3 years ago

There are two issues here- the first is that the documentation at https://www.agi.org.uk/gemini/40-gemini/1062-gemini-datasets-and-data-series#17 and the validation rules don't agree. The second is that it's not possible to have a valid Gemini record that includes British National Grid as a CRS (in any encoding) because it's not in the INSPIRE list, if the rule insists on checking every entry.

nmtoken commented 3 years ago

I think strictly the validation rules (what you must pass, as opposed warnings) and documentation do currently agree. If you can use urn:ogc:def:crs:EPSG::4258 and pass, as above, then you can certainly use urn:ogc:def:crs:EPSG::27700

There is a potential issue that you get a warning if you use any CRS that doesn't use an identifier specified in Annex D4. They were intended to be a this might be an issue, rather than this probably needs fixing. It's not an INSPIRE requirement to have a dataset or service that uses one of the CRS listed in D4, there are other valid CRS, for example CRS:84 is allowed in some circumstances, as would other compound/specialised CRS that uses ETRS89.

From my understanding of the INSPIRE legislation I had a feeling that a dataset/service that only supported EPSG:27700 (or rather a metadata record that only reported support for EPSG:27700) would not be a valid record, e.g. from COMMISSION REGULATION (EU) No 1089/2010 of 23 November 2010 like:

1.2. Datum for three-dimensional and two-dimensional coordinate reference systems

For the three-dimensional and two-dimensional coordinate reference systems and the horizontal component of compound coordinate reference systems used for making spatial data sets available, the datum shall be the datum of the European Terrestrial Reference System 1989 (ETRS89) in areas within its geographical scope, or the datum of the International Terrestrial Reference System (ITRS) or other geodetic coordinate reference systems compliant with ITRS in areas that are outside the geographical scope of ETRS89. Compliant with the ITRS means that the system definition is based on the definition of the ITRS and there is a well documented relationship between both systems, according to EN ISO 19111.

Though it might be an exception under 1.3.4. Other Coordinate Reference Systems?

If a new rule gets added to Metadata Item 17, to check that if the CRS listed is one in Annex D4, to require it uses the specified HTTP-URI as identifier, then it must only check the D4 list; a CRS that lists EPSG:27700 would not fail this validation check.

The question is whether there should be another check (supplemental) to see if at least one CRS from D4 is listed, it can only be a warning though because it's possible to have a CRS not listed in D4.

Note

the datum shall be the datum of the European Terrestrial Reference System 1989 (ETRS89) in areas within its geographical scope

The datum is described by https://epsg.org/datum_6258/European-Terrestrial-Reference-System-1989-ensemble.html

The extent is described by https://epsg.org/extent_1298/Europe-ETRF-by-country.html

Europe - onshore and offshore: Albania; Andorra; Austria; Belgium; Bosnia and Herzegovina; Bulgaria; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Germany; Gibraltar; Greece; Hungary; Ireland; Italy; Kosovo; Latvia; Liechtenstein; Lithuania; Luxembourg; Malta; Moldova; Monaco; Montenegro; Netherlands; North Macedonia; Norway including Svalbard and Jan Mayen; Poland; Portugal; Romania; San Marino; Serbia; Slovakia; Slovenia; Spain; Sweden; Switzerland; United Kingdom (UK) including Channel Islands and Isle of Man; Vatican City State.

etrs89datum

PeterParslow commented 3 years ago

I think that so far our approach for GEMINI is to implement the INSPIRE Metadata Technical Guidelines, not "just" the legislation. Publishers do have the option of (trying to) satisfy INSPIRE legislation/regulation in other ways, but I don't think we need to create support for those 'other ways'.

TG Requirement 2.2 tells us what to do with CRSs that are in Annex D.4. For CRSs that aren't in that list (such as EPSG:27700), only TG Requirement 2.1 would apply (code+codeSpace), which says nothing about how the "code" is encoded (URN, URI) - just that it shall use a CRS "specified in a well-known common register".

Either way, some of the requirements would be very difficult to check in Schematron!

I think if the data is only available in EPSG:27700 then the data may not be in line with its technical specification - the IR Requirement allows for various exceptions. It is still possible to create a metadata record that conforms to the Metadata TG and describes a "possibly not conforming" dataset.

nmtoken commented 3 years ago

I did muse on the possibility of using the conformance statement to Commission Regulation (EU) No 1089/2010 as a trigger for/condition within a test, but it has limited use even then.

I did also consider the possibility of checking the extent of the dataset to see if it lies outside of the datum of the European Terrestrial Reference System 1989 (ETRS89) , and using that as a condition within a test, but I think it could only be used as a warning at best.

nmtoken commented 3 years ago

I think we agree that we should add a ruleset to the Metadata Item 17 - Spatial Reference System rules that test for TG Requirement 2.2.

In such a ruleset the XML example provided with one CRS using a URN identifier (urn:ogc:def:crs:EPSG::4258) would fail validation, because EPSG:4258 is in the list of default CRS, and as such requires the specified HTTP-URI identifier (http://www.opengis.net/def/crs/EPSG/0/4258).

A metadata record using only one CRS where that CRS is EPSG:27700, would pass this new rule (and existing rules).

Do we need to have a ruleset to warn (so supplemental) that such a record is, or is likely to be, invalid[1], because the regulation stipulates that a dataset with UK coverage (because the UK landmass falls within the scope of the datum of the European Terrestrial Reference System 1989 (ETRS89)) should use a CRS using ETRS89?

Such a rule could be generic (CRS is not in D4) or specific to EPSG:27700

Should we have a ruleset to warn (so supplemental) that none of the listed CRS in the metadata records is in D4. In such a ruleset a record that listed EPSG:27700 and http://www.opengis.net/def/crs/EPSG/0/4258 would pass.

Or is the feeling that rulesets to warn on potential CRS issues are confusing and should be removed?

[1] Because AFAICT the whole of EPSG:27700 lies within the extent scope of the cited datum (so 1.3.4.2 doesn't apply), and no spatial data theme specifies it (so 1.3.4.1 also doesn't apply). Or is it that the UK is not regarded as continental Europe (even though the continental margin is in the Atlantic), and the British government has defined EPSG:27700 as a suitable CRS so (so (1.3.4.2 does apply)

PeterParslow commented 3 years ago

I agree with your suggested addition to MI 17.

I'm not sure what the difference is between your two suggested supplemental rules, except in the wording of the message, which I think should combine them -

"may be invalid because non of the listed CRS is in Annex D4 of the Metadata Regulation. Data geographically in Europe should be available in ETRS89; outside this it should be in WGS84".

but actually, people might then expect us to have looked at the bounding box to decide for them!

Below that of course is the subtle clash that we've said "GEMINI records should satisfy INSPIRE", but not that the datasets have to - and if the dataset isn't available in ETRS89, it wouldn't really be correct in describing it as if it were.

[1] there used to be a "UK Location Programme" document on CRSs, among which it clarified that the UK is in Europe for this purpose. I guess that clarification should be added to https://www.gov.uk/government/publications/open-standards-for-government/exchange-of-location-point

archaeogeek commented 3 years ago

Closed in favour of https://github.com/agiorguk/gemini-schematron/issues/7.