Quantity kinds for understanding ratios

letmaik commented 8 years ago

(related to #39) @adamml

Since percentages and ratios come up frequently, I thought I look at qudt and UCUM again to see how this could be done.

Percentages seem easy:

  "unit" : {
    "label": {
      "en": "Percent"
    },
    "symbol": {
      "value": "%",
      "type": "http://www.opengis.net/def/uom/UCUM/"
    }
  },

For [0,1] ratios, this would be:

  "unit" : {
    "label": {
      "en": "Ratio"
    },
    "symbol": {
      "value": "1",
      "type": "http://www.opengis.net/def/uom/UCUM/"
    }
  },

Of course, we could also define that the unity unit is the default if none is given.

Percentage could also be given as 1/100 in UCUM, which would loose some semantics. But I think this highlights that the unit semantics shouldn't be overrated here and the actual type of the number/quantity has to be described differently. In qudt there are quantity kinds, for example the percent unit is defined as:

<rdf:Description rdf:about="http://qudt.org/vocab/unit#Percent">
  <qudt:quantityKind rdf:resource="#DimensionlessRatio"/>
</rdf:Description>

A [0,1] ratio would also have the DimensionlessRatio quantity kind.

The problem with quantity kinds is that they massively overlap with observedProperty. I would say they are more or less the same. In the example above, the observed property "sea_ice_area_fraction" would be inherited from DimensionlessRatio. So the obvious question would be: Do we stick to observedProperty or are quantity kinds the real thing? In both cases, the parent types would be useful to know, also this touches the complex properties model by @adamml again:

  "observedProperty" : {
    "id": "http://vocab.nerc.ac.uk/standard_name/sea_ice_area_fraction/",
    "label" : {
      "en": "Sea ice area fraction"
    },
    "matrix": "http://.../sea_ice",
    "parents": [{
      "id": "http://.../area_fraction/",
    }, {
      "id": "http://qudt.org/vocab/quantity#DimensionlessRatio"
    }]
  }

Any thoughts on all this?

jonblower commented 8 years ago

The serialisations for percentages and ratios look OK to me. I'm a bit worried that fully describing an observed property could get very complicated, and we may quickly hit a point of diminishing returns. I guess the question is - what could an automated client do with the information about quantity kinds that it can't do with the observedProperty identifier? There has been lots of talk about decomposing the CF standard names to do something similar to the above, but it hasn't got very far, because the problem quickly gets complicated. There may be lots of widely-used standard names that aren't easily described by existing quantity kind types so it may be hard to find a general solution.

In other words, I think having a URI for the observedProperty and a properly-described unit is enough for now. If the URI points to a human-readable description (or if this description is embedded in the observedProperty object) then at least a user can decide whether it's what they want.

I think that if we were to put effort in this general direction I would prefer to focus on more fully describing statistical quantities, uncertainty information (a la UncertML) and distinguishing absolute values from differences. I think these things have more obvious use cases (e.g. visualisation, unit conversion).

letmaik commented 8 years ago

Still, would be nice if you knew if something is a ratio or not. Then creating legends/axis labels would be easier. Otherwise you wouldn't know if a 1-unit is a count, a ratio, or something else. Even worse, you may display the "1" as the unit in the legend, which would be confusing.

elmuertho commented 8 years ago

Thanks Maik for taking this up, these units are definitely important! I don't know if I understood your discussion right, but I feel that some observed properties are quite complicated and moreover units may not explain a property well. For example, "dry weight of corn per field area" in kg/m2 or t/ha is not described well by these units alone. Only physical sciences do well with basic units... Hence, in our case a well defined unit and an URI for the property will work well at the moment. If there are many catalogues in the future, things may change....

jonblower commented 8 years ago

@elmuertho - the UCUM system defines a syntax for "derived" units like kg/m2 or t/ha, so that an intelligent client could decode them. I agree that a property URI and well-defined unit should work for you at the moment.

@neothemachine - you're right, it would be a good idea to be able to distinguish different kinds of dimensionless unit. Maybe the observedProperty could have an optional set of parent types (or skos:narrowerThan properties):

  "observedProperty" : {
    "id": "http://vocab.nerc.ac.uk/standard_name/sea_ice_area_fraction/",
    "label" : {
      "en": "Sea ice area fraction"
    },
    "narrowerThan": "http://qudt.org/vocab/quantity#DimensionlessRatio"
  }

letmaik commented 8 years ago

@jonblower I like how you use narrowerThan here. In SKOS, it's just skos:narrower which is confusing if you don't read the SKOS spec, as it could mean both directions. But narrowerThan makes it clear. I would suggest to define "narrowerThan" as being an optional array of URIs, not just a single URI. This should make it simple to process and allows for multiple parents. Even though all the parents are observed properties themselves I think it's fine to not make them embedded objects since this really is about semantic understanding and not so much about human-suitable display of metadata.

If we decide to do that, how should the unit for ratios be in that case? Should it still be included as "1" or rather left out? I'm thinking about simple visualization clients that don't look at narrowerThan and want to create useful legends and in that case don't want to display a 1 unit. We could say that if the unit is left out and the parameter is not categorical, then it defaults to that 1 unit.

jonblower commented 8 years ago

You could probably leave the unit in. A client could see that the unit is "1" and perhaps decide not to display it. A slightly more sophisticated client could see if the property is narrower than qudt:DimensionlessRatio to confirm.

Or we could say that units are optional in some cases. (Remember that salinity is also unit-less, strictly speaking, although most people display "psu".)

It wouldn't be hard for a client to say that, if the unit is missing or "1" just don't display it.

letmaik commented 8 years ago

To solve this annoying "psu" business for good, I propose the following:

{
  "type" : "Parameter",
  "description" : {
    "en": "Sea water practical salinity measured in practical salinity units (psu)."
  },
  "unit" : {
    "label": {
      "en": "Practical Salinity Units"
    },
    "symbol": {
      "value": "{psu}",
      "type": "http://www.opengis.net/def/uom/UCUM/"
    }
  },
  "observedProperty" : {
    "id" : "http://vocab.nerc.ac.uk/standard_name/sea_water_practical_salinity/",
    "label" : {
      "en": "Sea water practical salinity"
    }
  }
}

Quoting UCUM:

Curly braces may be used to enclose annotations that are often written in place of units or behind units but that do not have a proper meaning of a unit and do not change the meaning of a unit. Annotations have no semantic value. For example one can write “%{vol}”, “kg{total}”, or “{RBC}” (for “red blood cells”) as pseudo-units. However, these annotations do not have any effect on the semantics, which is why these example expressions are equivalent to “%”, “kg”, and “1” respectively.

Any idea why the CF Conventions list sea_water_practical_salinity with 1e-3 as canonical unit and not 1? If values are given with 1e-3 unit, then the above unit symbol would change to "1e-3{psu}".

So, yes, I think units could be optional and would then default to 1, and it should not be forbidden to explicitly have "1" units (as above, "{psu}" is a 1-unit). How "1" is exactly serialized in the symbol string depends on the unit scheme however, so it may not be easy to check in simple clients. But I guess this is overthinking again since the symbol string is really meant for clients that have an idea of understanding it. Maybe we should add a separate field for an optional human-readable ascii notation of units which may be used by clients for display purposes for the case that they can't / don't want to parse the actual units. The idea then would be that this human string should be left out if the typed unit in the unit scheme is already human readable, e.g. K for kelvin in UCUM.

adamml commented 8 years ago

I've been following along - but not necessarily keeping up!

Some thoughts:

You might find Jeremey Tandy's Gist interesting (here).

Quantity Kind In Simon Cox's Observable Property ontology, a super-class of QUDTQuantity Kind is defined - Property Kind. It's a useful bucket class for everything.

skos:narrowerThan I think there was a narrowerThan in an old version of SKOS, but if so it isn't there any more. But do you really want to say that something you've measured is a narrower concept than its unit of measure? Maybe it would be better to say something like

quantity:Dimensionless qudt:referenceThing cf:sea_ice_area_fraction

or:

quantity:DimensionlessRatio qudt:referenceThing cf:sea_ice_area_fraction

Or possibly even:

cf:sea_ice_area_fraction qudt:referenceUnit unit:Unity

Salinity Salinity is a pain. There's Practical Salinity, Absolute Salinty, UNESCO 1983 algorithms, TEOS-10 algorithms. There's Practical Salinity Units, there's dimensionless, there's parts per thousand. The procedure/analysis/calculation used to generate the salinity tells you the units. It's a can of worms.

letmaik commented 8 years ago

@adamml "But do you really want to say that something you've measured is a narrower concept than its unit of measure?" -> No, I think you misread that. The narrower relationship is only between observed properties / quantity/quality/property kinds.

Thanks for pointing to @6a6d74's gist. I find the schema image very useful. A notable difference to our model is that the unit of measurement is part of the observedProperty/propertyKind (see ScaledQuantityKind). In CovJSON this is part of the Parameter itself, along with encoding details for mapping categories to integers.

Looking at the last example of the gist, it looks like the equivalent of a categorical Parameter:

<http://codes.wmo.int/bufr4/b/22/061>
  a skos:Concept, op:QualityKind ;
  rdfs:label "State of the sea"@en ;
  <http://codes.wmo.int/def/bufr4/dataWidth_Bits> 4 ;
  <http://codes.wmo.int/def/bufr4/fxy> "022061" ;
  <http://codes.wmo.int/def/bufr4/referenceValue> 0 ;
  <http://codes.wmo.int/def/bufr4/scale> 0 ;
  dct:references  <http://codes.wmo.int/bufr4/codeflag/0-22-061> ;
  skos:notation   "061" ;
  op:applicableVocabulary <http://codes.wmo.int/bufr4/codeflag/0-22-061> .

The categories are defined in applicableVocabulary and this includes a skos:notation property for each category which is an integer, and possibly that should be the integers stored in data files? Not completely sure, maybe it's something else. Certainly the dataWidth_Bits, scale etc. terms above are encoding related.

So in summary, there is just a single concept there (PropertyKind) which is linked to generalized variants (e.g. without encoding details, without units of measurement, just canonical units) whereas we currently have two main concepts, a Parameter and inside that the observed property. When looking at PropertyKind's featureOfInterest, then this would probably be the coverage domain, which wouldn't fit here anyway in our structure.

I wonder if this design is actually good or not, since it seems to mix the abstract "what is observed" concept with encoding details. For example, this is a category: http://codes.wmo.int/bufr4/codeflag/0-22-061/_0 However this is tightly coupled to the encoding and there is no separate abstract resource for the category as such which then could be reused in different encodings.

adamml commented 8 years ago

@neothemachine I'm still a bit confused by @jonblower's

"Sea ice area fraction" "narrowerThan" "http://qudt.org/vocab/quantity#DimensionlessRatio"

(see here) but it's entirely possible I'm being thick.

The use case of @6a6d74's gist is to provide Linked Data / Semantic Web support for describing BUFR encosings to the WMO. That use case forces the coupling of the "what is observed" and the "encoding details." I don't know if there's a pattern for reusing the abstract resource category outside of BUFR, or if that is a use case they have considered.

jonblower commented 8 years ago

It's intended to say that "sea ice area fraction" is a ratio quantity. "Ratio" is a kind of quantity here, not a unit of measure. Ratios are dimensionless, but so are lots of other things (counts, arbitrary scales), so a client can't look at the units to find out whether the quantity is a ratio. Knowing that it's a ratio enables a client to deduce that values must be between 0 and 1, for instance, as well as other kinds of properties that ratios have, but counts and arbitrary scales do not.

letmaik commented 8 years ago

So are we including narrowerThan in the spec or not?

  "observedProperty" : {
    "id": "http://vocab.nerc.ac.uk/standard_name/sea_ice_area_fraction/",
    "label" : {
      "en": "Sea ice area fraction"
    },
    "narrowerThan": ["http://qudt.org/vocab/quantity#DimensionlessRatio"]
  }

By the way, qudt has "generalization" instead of narrowerThan:

  <rdf:Description rdf:about="#ReynoldsNumber">
    <skos:exactMatch rdf:resource="http://dbpedia.org/resource/Reynolds_number"/>
    <qudt:generalization rdf:resource="#DimensionlessRatio"/>
    <qudt:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The Reynolds number (Re) is a dimensionless number defined as the ratio of inertial forces to viscous forces and, consequently, it quantifies the relative importance of these two types of forces for given flow conditions.</qudt:description>
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Reynolds Number</rdfs:label>
    <rdf:type rdf:resource="http://qudt.org/schema/qudt#FluidMechanicsQuantityKind"/>
  </rdf:Description>

qudt:generalization is defined as:

This property relates a quantity kind to its generalization. A quantity kind, PARENT, is a generalization of the quantity kind CHILD only if: 1. PARENT and CHILD have the same dimensions in every system of quantities; 2. Every unit that is a measure of quantities of kind CHILD is also a valid measure of quantities of kind PARENT.

And it's always a single parent, not multiple.

jonblower commented 8 years ago

Including narrowerThan makes sense to me, but it would be good to get some independent validation from @adamml. Adam, what do you think?

I'm getting the impression that narrowerThan (which is essentially an alias for skos:narrower) is more suitable than qudt:generalization, as it doesn't force the two things both to be quantity kinds, and it allows multiple parents - right?

adamml commented 8 years ago

It all depends how strict your being - i.e. are you likely to ever run a reasoner over any of this?

If you're being strict then, yes stick with narrowerThan (although if you're being really strict and you're aliasing narrowerThan to skos:narrower then the two things have to be skos:Concepts ;) ). There's nothing that I know of in SKOS to say you can only have one parent.

If you're willing to be a little bit more relaxed qudt:generalization is more expressive, but if you reason over it you as @jonblower says you need to make sure you have only one parent and everything is a QuantityKind.

letmaik commented 2 years ago

I would love to get this sorted properly but I have a feeling this will have to be figured out by someone else and then CovJSON can adopt it, if it's a simple and clear solution. At the moment there are too many complex/confusing candidates and I think it's too early to pick a winner. The current pragmatic approach of using "%" and "1" as units from UCUM seems to work reasonably well for the time being.

covjson / specification

Quantity kinds for understanding ratios #61