Updating Knowledge graph to add the calibrated parameters

atulag0711 commented 2 years ago

Hello, After the calibration is performed, the calibrated Youngs mod (E) needs to be stored in the Knowledge graph. The project aims for data provenance so I think samples (in CSV format)(just the path stored), mean values, standard deviation and the path to the calibration script (if someone asks how it was calibrated) needs to be stored. Please see the attached mind map I made. This will involve the following updates to the current setup: Updating_KG_afterCalibration

@JulietteWinkler updates the Ontology.
@PoNeYvIf Script to update the KG with the data obtained from calibration.

I already discussed with @PoNeYvIf. He is clear he said.

joergfunger commented 2 years ago

This looks good, though if I would store not store the csv samples, but rather the arviz file (which is a more structured representation available for all toolboxes including pyro, stan, probeye, ..). In addition, the information that is stored in the export_to_rdf of probeye is certaintly included in the calibration script (which I would store as well), but includes machine readible information that can be queried (the script can't). Related to this there is a second question: what is the mean and standard deviation that we store. In our Bayesian setting this makes sense, but using different priors would result in different posteriors. Would we then choose reasonable priors and assume those to be always the same (thus hardcoded in the files), or do we provide an option to have multiple results (either again using Bayesian inference with different priors or likelihoods - e.g. with and without correlation, or even using a deterministic least squares fit being stored in parallel in the KG as a result of the Youngs modulus test?

atulag0711 commented 2 years ago

.csv file for reducing further abstractions. Lot of people would care about arziv and would want raw set of data and they can tool of their choose.
Yes .ttl file from export_to_rdf can also be stored. So path to the script and the .ttl will be stored there.
With a reasonable prior and converged inference process, I think it makes no sense to store multiple results. I think it makes no sense to complicate a rather simple problem at hand.

joergfunger commented 2 years ago

not sure what you store, csv or arviz? If you use plain csv, make sure to be able to describe exactly what each entry in the file means, alternatively json might be an option.
it seems weird to store a link to a ttl file in the KG instead of directly attaching the relevant information into the KG
the problem is simple, nevertheless we will have the same problem for the real demonstrator. If we run the calibration or the optimization multiple times (e.g. with different data sets) we would get a different result each time. If we are storing that in a triple store (not in our simple KG that we always rebuild, but a permanent one), how do we distinguish the results? That said, you might consider the problem trivial for the minimum working example, but it is the basis for moving to a real triple store which I don't see with the current approach.

BAMresearch / LebeDigital

Updating Knowledge graph to add the calibrated parameters #65