ISO-TC211 / StandardsTracker

This GitHub repository lets you - our users - log and track issues that you find with our standards and other document. Tag the issue with the standard or standards effected; we will assign it to the relevant group(s) within TC 211.
11 stars 0 forks source link

19103 Guidance on code lists needed #389

Open heidivanparys opened 3 years ago

heidivanparys commented 3 years ago

ISO 19103 should be updated regarding the section on code lists. Since the last edition, good practices have emerged, and the topic "code lists" can still give quite some discussion and confusion in projects, so it would be good to describe that in more detail.

Some references on INSPIRE practices:

[1] DRAFTING TEAM ‘DATA SPECIFICATIONS’. D2.5: Generic Conceptual Model, Version 3.4. 8 April 2014. Available from: http://inspire.ec.europa.eu/documents/inspire-generic-conceptual-model [2] JRC REGISTRY TEAM. Best Practices for registers and registries & Technical Guidelines for the INSPIRE register federation. Guidance document. INSPIRE Maintenance and Implementation Group (MIG), 31 May 2017. [Viewed 29 November 2017]. Available from: https://inspire.ec.europa.eu/id/document/tg/registers-and-register-federation

An example of a (now resolved) discussion: https://github.com/opengeospatial/CityGML-3.0CM/issues/10.

PeterParslow commented 3 years ago

My thoughts:

ISO 19103 shows how to model a code list in UML, as a class stereotyped "CodeList". This description is adopted/assumed in ISO 19109, ISO 19136, and ISO 19139-1. Interestingly, none of the standards actually declare the use of the CodeList stereotype as a requirement…

ISO 19136 & ISO 19139-1 (for ‘feature instances’ and other data, respectively) provides almost identical encoding rules¹ for code lists (i.e. UML classes stereotyped CodeList as per ISO 19103). Both put most of their effort into specifying how attributes whose types are code lists should be modelled: the expectation is that any data which uses values from the code list can reference them via an HTTP URI (using xs:anyURI as the type of the XML elements that represent such attributes).

Regarding modelling the code list itself, ISO 19139 puts it like this: “CodeLists and their associated definitions are controlled in registers”, and says little more.

The fact that current TC211 code lists are published as GML Dictionaries is just true rather than necessary. It was required by one statement in ISO 19136-1 that applied to CodeList classes in Application Schemas, where the class is tagged asDictionary – but that is deprecated by ISO 19136-2, which brought it in line with what had always been in ISO 19139. Note, the GML 3.3 / 19136-2 default is that code lists are external dictionaries & the example dictionaries are in SKOS (the requirement is to use ‘any suitable syntax or encoding’).

So, I’m not sure that ISO 19103 needs any change here. Perhaps a mention that the code list itself could be published in a variety of ways, and even that RDF/SKOS is currently seen as good practice for publishing such lists (as mentioned in ISO 19136-2)? INSPIRE publishes its code lists (and individual values) in five formats, with RDF (using some SKOS) as one of them.

¹ ISO 19136 / GML allows for some code list values to be directly in the data, rather than in a register (the ‘other:….’ pattern); this is still allowed in -2 / GML3.3.

jetgeo commented 3 years ago

Perhaps a mention that the code list itself could be published in a variety of ways, and even that RDF/SKOS is currently seen as good practice for publishing such lists (as mentioned in ISO 19136-2)? INSPIRE publishes its code lists (and individual values) in five formats, with RDF (using some SKOS) as one of them.

I believe this is close to what we need in ISO 19103. Perhaps with some examples.

ogcscotts commented 3 years ago

Another example of a codelist tied to a Standard that is hosted on a register is the Cadastre and Land Administration Thesaurus (CaLAThe): http://defs.opengis.net/vocprez/object?uri=https%3A//www.opengis.net/def/CaLATheCodeList

heidivanparys commented 2 years ago

From https://webgate.ec.europa.eu/fpfis/wikis/pages/viewpage.action?pageId=803309194&preview=/803309194/840434584/Interoperability%20regulation_proposed%20changes.pdf (shown during the 66th MIG-T meeting):

[...] Since enumerations are essentially equivalent to non-extensible code lists, it is proposed to remove the notion of enumeration from the text altogether, i.e. in the definitions in Article 2 and elsewhere in the Implementing Regulation [...]

PeterParslow commented 2 years ago

Does anyone know if INSPIRE intend to follow up any such change to their Regulation with equivalent change to their Technical Guidance, specifically all the thematic Data Specifications, the data models of them, and the automatically derived GML XSDs? Or are they considering changing the regulation in order to make it easier to change the code lists (as implied in the 'benefits' in the document, and then only actually change enumerations to code lists in the Technical Guidance as & when needed?

Each time they do that, they would render all existing compliant data non-compliant.

Or is it more a matter that they want to free other encodings up from it, but intend to leave the GML alone?

Doing this in 19103 would have a knock on effect:

Initially, it would make a lot of our (& OGC's, & INSPIRE's) UML non-compliant, with each change then needing to be implemented in any XML schema.

After doing all that, then these two encoding rules would become redundant:

heidivanparys commented 2 years ago

There is more information on the change proposal, the reasons behind and the expected impact in the document Possible revision of Implementing Rules on data interoperability Status of change proposals (from the 8th MIG meeting).

It says i.e.:

[...]

The TGs will need to be updated accordingly.

For the change from enumeration to code lists, some of the schemas will need to be updated.

[...]

The schema updates could probably be done in a backwards-compatible way, allowing (for a time) both code list references and enumeration values.

The main reason I mentioned it here is that when I read ISO 19103, "extensibility" is the delimiting characteristic that distinguishes an enumeration from a code list.

But I am wondering whether the delimiting characteristic is actually whether the life course of the enumerated type is tied to the life course of the data model or not:

E.g. INSPIRE defined code lists have a property "extensibility" with possible values as described on https://inspire.ec.europa.eu/registry/extensibility/.

It would be interesting to have a discussion about that.

heidivanparys commented 2 years ago

I will use the opportunity created by the move to GitHub by cross-linking some issues.

An interesting topic regarding code lists is the encoding of code lists and data that refers to code lists. HTTP URIs are ok in GML and in data "on the web", but are not really common practice in relational databases, such as GeoPackage.

ISO 19103 mentions mnemonic names, but in reality many code lists use "notations" ≈ "code values". And many code list values also have a human-friendly label. Do we then still need the mnemonic name? And how to model all this information, or is that out of scope?

In the context of INSPIRE, there is a discussion in https://github.com/INSPIRE-MIF/gp-geopackage-encodings/issues/17 regarding how to data using code list values in GeoPackage. But the topic is really relevant as well also outside INSPIRE; the INSPIRE ad-hoc group on that is considering developing an extension to the GeoPackage specification.

Any feedback on the thoughts there would be welcomed. A draft encoding rule has been developed for the European Noise Directive. That document is not (yet) online, but some information is in this presentation, and sample GeoPackage files developed according to that draft are in this repository.