Closed MathiasVDA closed 1 month ago
You are right on this, and we have had long conversations on this, given that the inclusion of units of measure adds three more triples, as you clearly show. Another option would be to use UCUM, which is more compact but not standard either.
For the time being, we will add in the documentation of the properties the preferred unit of measurement, so that it should be taken into account when generating data.
There's a lively discussion on the subject going on within UIC. There, http://qudt.org/vocab has gained some traction.
I would much appreciate consensus on the subject.
Please also have a look at hal-01885337:
Maxime Lefrançois, Antoine Zimmermann. The Unified Code for Units of Measure in RDF: cdt:ucum and other UCUM Datatypes. The Semantic Web: ESWC 2018 Satellite Events. ESWC 2018, Jun 2018, Heraklion, Greece. pp.196-201, 10.1007/978-3-319-98192-5_37. hal-01885337
Essentially trying to use UCUM to have a concise representation of units.
I agree on the usefulness and compactness of UCUM, which would be in general my preferred choice. However, there are also concerns in the community around the fact that this is not a standard-based representation, and hence it will not allow for some comparisons to be made at, for instance, the SPARQL level. We will explore the possibility of having the two representations (UCUM and non-UCUM based).
@ocorcho what would be a (or the) standard-based representation of units ? for the units themselves there is little doubt about the validity and usefulness of the SI system, but its representation as an ontology is another business that includes many more decision points. This is well described in this presentation (from the BIPM website, dated 2019) : https://www.bipm.org/en/search?p_p_id=search_portlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=%2Fdownload%2Fpublication&p_p_cacheability=cacheLevelPage&_search_portlet_dlFileId=28434467&p_p_lifecycle=1&_search_portlet_javax.portlet.action=search&_search_portlet_page=previous&_search_portlet_operation=changePage
Quote : "Work with user communities to establish unit ontologies & agreed, clear implementations"
This quote does not seem to acknowledge past efforts. So do we have an open field here, or would you consider some ontology to be more authoritative than others in 2023, and if so, why? wider usage? endorsement by some industry standard maker (W3C...)?
Concerning "pure" ontology engineering aspects, some ontologies define units as individuals, others as classes. Examples in the "centimeter" case:
From a software engineering aspect, I'd rather go for particular units as classes since I can pack them with methods for unit conversion, applying a "singleton" decorator to avoid generating a new instance of "centimeter" with every new value. But ontology engineering is not OOP (despite some real, and other treacherous, analogies) and I'd like to have your opinion.
There's a lively discussion on the subject going on within UIC. There, http://qudt.org/vocab has gained some traction.
I would much appreciate consensus on the subject.
Both options (QUDT and UCUM) are used in the SSN ontology- See some samples here https://w3c.github.io/sdw/ssn/#iphone_barometer-sosa
Unit of measure has been included in ERA ontology 3.1.0 as an annotation property
### http://data.europa.eu/949/unitOfMeasure era:unitOfMeasure rdf:type owl:AnnotationProperty ; dct:created "2024-05-16"^^xsd:date ; rdfs:comment "Magnitude of a quantity, defined and adopted by convention or by law, that is used as a standard for measurement of the same kind of quantity."@en ; rdfs:isDefinedBy <http://data.europa.eu/949/> ; rdfs:label "Unit of measure"@en .
and refers to the qudt UNIT ontology. For example,
### http://data.europa.eu/949/crossSectionArea era:crossSectionArea rdf:type owl:DatatypeProperty , owl:FunctionalProperty ; rdfs:domain era:Tunnel ; rdfs:range xsd:integer ; era:XMLName "ITU_CrossSectionArea" ; era:rinfIndex "1.1.1.1.8.8" ; era:unitOfMeasure <http://qudt.org/vocab/unit/M2> ; dct:created "2021-08-03"^^xsd:date ; dct:modified "2021-08-03"^^xsd:date ; rdfs:comment "Smallest cross section area in square metres of the tunnel."@en ; rdfs:isDefinedBy <http://data.europa.eu/949/> ; rdfs:label "Cross section area"@en ; vs:term_status "stable" .
You can see now the length with the unit of measure in the Abrunhosa tunnel.
Interesting approach, as annotation properties can be exploited by SPARQL and SHACL, but would be ignored by reasoners. Until now, I could see no compelling reason for giving the reasoner access to the units, in the context of RINF. Your choice just works.
From a more general perspective, I'd rather bind the unit to the value of the property, not to the property itself (and this implies property reification): a cross-section is a quantity of type (or kind) area; allowed units may be m2, or sqft... and make sense of the value, not of the property. But if the reasoner is not provided with both infos (quantity kind and unit) that would make a consistency check possible (between quantity type and unit), there is no point in bothering it with units info.
For a "more open" world of data with multiple provenances, semantics differentiating quantity kinds, units, and the meaningful matches (= the full QUDT scope) still make sense. This is the direction taken by the S2R/Europe's Rail CDM (in particular RSM and EULYNX DP) since about 2021.
If ERA VOC confirms its using QUDT vocabulary for units, that would be one more reason for the CDM to favour QUDT over UCUM. So you just helped us making progress in the decision-making...
The approach taken with the annotation property and the reference to the Unit ontology is based on the fact that for each parameter, the unit of measure has been (pre)defined in the legal text. We understand that a more general perspective is to bind the unit to the value of the property but as you mention, it would require reification but for now we decided on this simpler approach.
understood and agreed. Issue might be closed then, having made sure that 1) legal basis provides units and 2) ERA vocabulary version 3.1.x encodes them as annotations in all corresponding parameters. Or is there any other point of interest? no comment from Oscar, who may have had reasons for supporting UCUM earlier?
Dear @Airy59, nice to hear from you! As Edna said;, the consensus is to go with QUDT. We expect the annotation form could be "temporary" because we really need to have a full usage of QUDT at the level of the value of the property as mentioned in one of your comment. For that, there is a need to see the impact on the current applications and the overall KG. Yes, the current solution is the simplest approach with minor consequences.
You can see now the length with the unit of measure in the Abrunhosa tunnel.
It seems that the prefLabel for the country (at least in my browser) is not EN. It is EL - See the URI resolvable here http://publications.europa.eu/resource/authority/country/PRT for Portugal.
Hi @gatemezing, then the path is clear:
In any case, transformations shall be "automation-ready", both directions.
Great! happy to see some progress on the work done by ERJU CDM on that matter. We'll converge at some point in time.
We consider this issue fixed with the current release.
While looking at a data instance, for example the Abrunhosa tunnel
We found that the attribute Length or Cross section does not have a unit of measure defined.
Not specifying the unit of measure will lead to data quality issues. One can enter a length in meters, centimeters, ... The same is true for applications that are using the data, how do they need to interpret the double or integer value?
Another approach could be how the Gist ontology handles units of measure: Quantities, Magnitudes, and Units & Measures in Gist - YouTube. This is an interesting approach but will add an extra step in the data. Instead of:
You will have: