Interoperable-data / ERA_vocabulary

ERA vocabulary is an ontology defined by the European Union Agency for Railways (ERA) to describe the concepts and relationships related to the European railway infrastructure and the vehicles authorized to operate over it.
https://data-interop.era.europa.eu/era-vocabulary/
MIT License
4 stars 3 forks source link

Add units of measure where relevant #1

Closed MathiasVDA closed 1 month ago

MathiasVDA commented 1 year ago

While looking at a data instance, for example the Abrunhosa tunnel

We found that the attribute Length or Cross section does not have a unit of measure defined.

Not specifying the unit of measure will lead to data quality issues. One can enter a length in meters, centimeters, ... The same is true for applications that are using the data, how do they need to interpret the double or integer value?

Another approach could be how the Gist ontology handles units of measure: Quantities, Magnitudes, and Units & Measures in Gist - YouTube. This is an interesting approach but will add an extra step in the data. Instead of:

<http://data.europa.eu/949/funtionalInfrastructure/tunnels/Abrunhosa_-7.637340.5732_-7.634740.5713> era:length  "305.0"^^xsd:double .

You will have:

<http://data.europa.eu/949/funtionalInfrastructure/tunnels/Abrunhosa_-7.637340.5732_-7.634740.5713> gist:hasMagnitude .<.../_extent> .
<.../_extent> a gist:Extent .
<.../_extent> gist:numericValue "305.0"^^xsd:double .
<.../_extent> gist:hasUnitOfMeasure gist:_meter .
ocorcho commented 1 year ago

You are right on this, and we have had long conversations on this, given that the inclusion of units of measure adds three more triples, as you clearly show. Another option would be to use UCUM, which is more compact but not standard either.

For the time being, we will add in the documentation of the properties the preferred unit of measurement, so that it should be taken into account when generating data.

sixdiamants commented 1 year ago

There's a lively discussion on the subject going on within UIC. There, http://qudt.org/vocab has gained some traction.

I would much appreciate consensus on the subject.

tanepierre commented 1 year ago

Please also have a look at hal-01885337:

Maxime Lefrançois, Antoine Zimmermann. The Unified Code for Units of Measure in RDF: cdt:ucum and other UCUM Datatypes. The Semantic Web: ESWC 2018 Satellite Events. ESWC 2018, Jun 2018, Heraklion, Greece. pp.196-201, ￿10.1007/978-3-319-98192-5_37￿. ￿hal-01885337￿

Essentially trying to use UCUM to have a concise representation of units.

ocorcho commented 1 year ago

I agree on the usefulness and compactness of UCUM, which would be in general my preferred choice. However, there are also concerns in the community around the fact that this is not a standard-based representation, and hence it will not allow for some comparisons to be made at, for instance, the SPARQL level. We will explore the possibility of having the two representations (UCUM and non-UCUM based).

Airy59 commented 1 year ago

@ocorcho what would be a (or the) standard-based representation of units ? for the units themselves there is little doubt about the validity and usefulness of the SI system, but its representation as an ontology is another business that includes many more decision points. This is well described in this presentation (from the BIPM website, dated 2019) : https://www.bipm.org/en/search?p_p_id=search_portlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=%2Fdownload%2Fpublication&p_p_cacheability=cacheLevelPage&_search_portlet_dlFileId=28434467&p_p_lifecycle=1&_search_portlet_javax.portlet.action=search&_search_portlet_page=previous&_search_portlet_operation=changePage

Quote : "Work with user communities to establish unit ontologies & agreed, clear implementations"

This quote does not seem to acknowledge past efforts. So do we have an open field here, or would you consider some ontology to be more authoritative than others in 2023, and if so, why? wider usage? endorsement by some industry standard maker (W3C...)?

Concerning "pure" ontology engineering aspects, some ontologies define units as individuals, others as classes. Examples in the "centimeter" case:

From a software engineering aspect, I'd rather go for particular units as classes since I can pack them with methods for unit conversion, applying a "singleton" decorator to avoid generating a new instance of "centimeter" with every new value. But ontology engineering is not OOP (despite some real, and other treacherous, analogies) and I'd like to have your opinion.

gatemezing commented 1 year ago

There's a lively discussion on the subject going on within UIC. There, http://qudt.org/vocab has gained some traction.

I would much appreciate consensus on the subject.

Both options (QUDT and UCUM) are used in the SSN ontology- See some samples here https://w3c.github.io/sdw/ssn/#iphone_barometer-sosa

ednaru commented 1 month ago

Unit of measure has been included in ERA ontology 3.1.0 as an annotation property ### http://data.europa.eu/949/unitOfMeasure era:unitOfMeasure rdf:type owl:AnnotationProperty ; dct:created "2024-05-16"^^xsd:date ; rdfs:comment "Magnitude of a quantity, defined and adopted by convention or by law, that is used as a standard for measurement of the same kind of quantity."@en ; rdfs:isDefinedBy <http://data.europa.eu/949/> ; rdfs:label "Unit of measure"@en . and refers to the qudt UNIT ontology. For example, ### http://data.europa.eu/949/crossSectionArea era:crossSectionArea rdf:type owl:DatatypeProperty , owl:FunctionalProperty ; rdfs:domain era:Tunnel ; rdfs:range xsd:integer ; era:XMLName "ITU_CrossSectionArea" ; era:rinfIndex "1.1.1.1.8.8" ; era:unitOfMeasure <http://qudt.org/vocab/unit/M2> ; dct:created "2021-08-03"^^xsd:date ; dct:modified "2021-08-03"^^xsd:date ; rdfs:comment "Smallest cross section area in square metres of the tunnel."@en ; rdfs:isDefinedBy <http://data.europa.eu/949/> ; rdfs:label "Cross section area"@en ; vs:term_status "stable" .

ednaru commented 1 month ago

You can see now the length with the unit of measure in the Abrunhosa tunnel.

Airy59 commented 1 month ago

Interesting approach, as annotation properties can be exploited by SPARQL and SHACL, but would be ignored by reasoners. Until now, I could see no compelling reason for giving the reasoner access to the units, in the context of RINF. Your choice just works.

From a more general perspective, I'd rather bind the unit to the value of the property, not to the property itself (and this implies property reification): a cross-section is a quantity of type (or kind) area; allowed units may be m2, or sqft... and make sense of the value, not of the property. But if the reasoner is not provided with both infos (quantity kind and unit) that would make a consistency check possible (between quantity type and unit), there is no point in bothering it with units info.

For a "more open" world of data with multiple provenances, semantics differentiating quantity kinds, units, and the meaningful matches (= the full QUDT scope) still make sense. This is the direction taken by the S2R/Europe's Rail CDM (in particular RSM and EULYNX DP) since about 2021.

If ERA VOC confirms its using QUDT vocabulary for units, that would be one more reason for the CDM to favour QUDT over UCUM. So you just helped us making progress in the decision-making...

ednaru commented 1 month ago

The approach taken with the annotation property and the reference to the Unit ontology is based on the fact that for each parameter, the unit of measure has been (pre)defined in the legal text. We understand that a more general perspective is to bind the unit to the value of the property but as you mention, it would require reification but for now we decided on this simpler approach.

Airy59 commented 1 month ago

understood and agreed. Issue might be closed then, having made sure that 1) legal basis provides units and 2) ERA vocabulary version 3.1.x encodes them as annotations in all corresponding parameters. Or is there any other point of interest? no comment from Oscar, who may have had reasons for supporting UCUM earlier?

gatemezing commented 1 month ago

Dear @Airy59, nice to hear from you! As Edna said;, the consensus is to go with QUDT. We expect the annotation form could be "temporary" because we really need to have a full usage of QUDT at the level of the value of the property as mentioned in one of your comment. For that, there is a need to see the impact on the current applications and the overall KG. Yes, the current solution is the simplest approach with minor consequences.

gatemezing commented 1 month ago

You can see now the length with the unit of measure in the Abrunhosa tunnel.

It seems that the prefLabel for the country (at least in my browser) is not EN. It is EL - See the URI resolvable here http://publications.europa.eu/resource/authority/country/PRT for Portugal.

PT-label
Airy59 commented 1 month ago

Hi @gatemezing, then the path is clear:

  1. you go ahead with your pragmatic use of QUDT units and check for consequences
  2. ERJU CDM (project MOTIONAL, WP30) goes ahead with "full" QUDT usage and we check for consequences on our side. In conjunction with SOSA/SSN, which is quite relevant to our use cases, it seems to be the proper approach.

In any case, transformations shall be "automation-ready", both directions.

gatemezing commented 1 month ago

Great! happy to see some progress on the work done by ERJU CDM on that matter. We'll converge at some point in time.

Interoperable-data commented 1 month ago

We consider this issue fixed with the current release.