Meta data model calibration

hoffmannmic commented 3 years ago

The issue is how to document/store information about the parametrization process of the computer model. There different types of information:

data used for the calibration
parameters used for the calibration
results

hoffmannmic commented 3 years ago

Thought 1: Use Ontologies and TripleStore DB

Use a suitable or extend a suitable ontology and store the meta data in a RDF DB using the ontology.

Example: Semantic concept schema of the linear mixed model of experimental observations (https://doi.org/10.1038/s41597-020-0409-7)

"In this paper, we propose a semantic model for the statistical analysis of datasets by linear mixed models. We tie together disparate statistical concepts in an interdisciplinary context through the application of ontologies, in particular the Statistics Ontology (StatO), to produce FaIR data summaries. "

Here the STATO ontology (STATistics Ontology, http://stato-ontology.org/) is extended.

Example: Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering (DOI: 10.1109/WORKS49585.2019.00006)

"If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stackholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle, while keeping the provenance capture overhead low. "

Here the data represented using W3C PROV / PROV-ML.

Thought 2: Using a Pipeline Framework

good overview of tools can be found here https://github.com/pditommaso/awesome-pipeline
using a whole framework or just a workflow languages?

joergfunger commented 3 years ago

I would argue that we might need both approaches. The first one allows to store general metadata so that you can perform a query on the specific calibration process you have performed (assume you have done many different calibrations, e.g. using different priors, different sensors, different queries/experiments) - we need a unique description to distinguish all those in order to put them back into the database (and being able to query them afterwards). However, I think it is just not possible to include everything (in particular just storing the query is not sufficient since the database itself might change, thus returning different results). Thus we would have to store the complete process in a workflow system that allows to document every single input/output in the complete workflow (and even within the modules of the workflow).

joergfunger commented 3 years ago

Maybe we should first start with asking ourselves, what information we would like to query afterwards. I created an entry in the wiki as a basis for our discussion today https://github.com/BAMresearch/ModelCalibration/wiki

joergfunger commented 2 years ago

Outdated due to changes in the focus towards lebedigital

BAMresearch / LebeDigital

Meta data model calibration #3