Open kuzeko opened 5 years ago
Also probably the PROV-O ontology can have some concept there we can use.
Each measure/value will have an uncertainty. So we need to decide what information about uncertainty we want to add. For example, we might add the type of distribution and its essential descriptive statistics: location and scale: example: "normal distribution with mean X and standard deviation Y"
Could be adopted from the OLCA schema, except for using xsd:float instead of xsd:double to avoid storing with unnecessary precision (while recommending to perform calculations with "double" for precision, see Ernerfeldt 2017). For formula language, the ecoSpold2 format use a sub-set of the OpenFormula standard. Other RDF-related formula standards are described on the Wikipedia-page for MathML. OLCA limits the UncertaintyType to normal, log-normal, triangular and uniform, where ecoSpold2 has additionally the lesser used beta, gamma, binomial, and "undefined", which allows storing practically any kind of uncertainty information. The ecoSpold2 also have numerical fields for pedigree data quality indicators.
I think using the OLCA schema is limiting but perhaps is good enough as pragmatic choice as a start. Shows more or less what type of information should be included in the ontology: type of distribution, value of the statistic, calculation method of the statistic
For now I'm on the side of the OLCA schema as it is simple enough for our current objectives. I'm not sure the other distributions can be seen in the eventual dataset. And I think it's easy enough to incorporate the other distributions as necessary.
We need to represent uncertainties in the measures/values.
We could adopt vocabularies/models from the SIO ontology
https://github.com/MaastrichtU-IDS/semanticscience
https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-5-14
We need to investigate: