BONSAMURAIS / BONSAI-ontology-RDF-framework

Recommendations and discussions on ontology and RDF framework development
BSD 3-Clause "New" or "Revised" License
6 stars 3 forks source link

Missing concept modeling: Uncertainty #2

Open kuzeko opened 5 years ago

kuzeko commented 5 years ago

We need to represent uncertainties in the measures/values.

We could adopt vocabularies/models from the SIO ontology

https://github.com/MaastrichtU-IDS/semanticscience

https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-5-14

We need to investigate:

  1. if there is something there we can use
  2. how that can be used
kuzeko commented 5 years ago

Also probably the PROV-O ontology can have some concept there we can use.

massimopizzol commented 5 years ago

Each measure/value will have an uncertainty. So we need to decide what information about uncertainty we want to add. For example, we might add the type of distribution and its essential descriptive statistics: location and scale: example: "normal distribution with mean X and standard deviation Y"

boweidema commented 5 years ago

Could be adopted from the OLCA schema, except for using xsd:float instead of xsd:double to avoid storing with unnecessary precision (while recommending to perform calculations with "double" for precision, see Ernerfeldt 2017). For formula language, the ecoSpold2 format use a sub-set of the OpenFormula standard. Other RDF-related formula standards are described on the Wikipedia-page for MathML. OLCA limits the UncertaintyType to normal, log-normal, triangular and uniform, where ecoSpold2 has additionally the lesser used beta, gamma, binomial, and "undefined", which allows storing practically any kind of uncertainty information. The ecoSpold2 also have numerical fields for pedigree data quality indicators.

massimopizzol commented 5 years ago

I think using the OLCA schema is limiting but perhaps is good enough as pragmatic choice as a start. Shows more or less what type of information should be included in the ontology: type of distribution, value of the statistic, calculation method of the statistic

mmr2187 commented 5 years ago

For now I'm on the side of the OLCA schema as it is simple enough for our current objectives. I'm not sure the other distributions can be seen in the eventual dataset. And I think it's easy enough to incorporate the other distributions as necessary.

agneta20 commented 5 years ago

Need to develop an example using Prob Onto