OpenEnergyPlatform / ontology

Repository for the Open Energy Ontology (OEO)
Creative Commons Zero v1.0 Universal
105 stars 19 forks source link

Data items indicating missing data #1765

Closed l-emele closed 5 months ago

l-emele commented 7 months ago

Description of the issue

In some data sets, so special data items are used to indicate that data is not available. Sometimes such data items even transport information about why data is not available.

Examples are:

Ideas of solution

Add classes:

We could potentially add individuals for the specific notation keys.

Workflow checklist

I am aware that

stap-m commented 7 months ago

A quick search across OLS.

Can we reuse something here? I also checked IAO but didnt find anything. Maybe I used the wrong search vocabulary...

han-f commented 7 months ago

I am afraid that different sources may use different vocabularies for the same issue they want to depict, so populating with individuals would probably depend on the source of notation keys.

I do like @l-emele suggestion for solution. Could a label be missing data or absent data?

l-emele commented 6 months ago
* [Not a number](http://dicom.nema.org/resources/ontology/DCM/114000): _Measurement not available: Not a number (per IEEE 754)._

This explicitly refers to the IEEE 754 which is a standard to represent floating-point numbers. Not a number (NaN) has there a special meaning, being the result of mathematically undefined calculations like 0/0 or ∞-∞.[^1] This is something different than what the notation keys depict, it is more a sibling than a parent concept. [^1]: Also NaN is not the only "notation key" in IEEE 754, there exist also INF and -INF to express positive and negative

* [Not available](http://purl.obolibrary.org/obo/NCIT_C126101): _The desired information is not available._
* And there is a set of subclasses of "[missing value reason](http://purl.obolibrary.org/obo/NCIT_C48655)" in NCIT: _A specific reason explaining why a meaningful value is not available. A meaningful value answers the question posed by a Data Element Concept. In contrast, a Missing Value Reason answers the implicit question "Why is there no 'meaningful' value?", when there is none._

I like the general concept Missing Value Reason of NCIT, however NCIT has a whole list of 35 different reasons (not counting the subsubclasses). Some of the intended notation keys can be mapped to NCIT, e.g. the intended classified (C) seems to match matched data from NCIT. However not all can be matched, e.g. I did not find a corresponding class for the intended not occurring (NO).

Can we reuse something here? I also checked IAO but didnt find anything. Maybe I used the wrong search vocabulary...

I like both the label and the definition of missing value reason. However, NCIT is not BFO-aligned so we cannot use it directly.

What about defining our own missing value reason as: A missing value reason is a data item that is used to indicate that data is not available and to provide a specific reason explaining why a meaningful value is not available. (Highlighted part copied from the NCIT definition.)

The remaining part of the NCIT definition we could reuse as an elucidation: A missing value reason answers the implicit question "Why is there no 'meaningful' value?", when there is none.

I am afraid that different sources may use different vocabularies for the same issue they want to depict, so populating with individuals would probably depend on the source of notation keys.

Yes, this is why I proposed the additional subclass notation key and potential other sibling classes for missing value reasons in other contexts, e.g. Eurostat data or floating point data in IEEE 754 format.

I do like @l-emele suggestion for solution. Could a label be missing data or absent data?

We could use these proposals as alternative terms.

l-emele commented 5 months ago

I think we have an agreement to add the following two classes:

I will add those two to oeo-model, further subclasses of missing value reason can be added at a later stage.