COSIMA / cosima-cookbook

Framework for indexing and querying ocean-sea ice model output.
https://cosima-recipes.readthedocs.io/en/latest/
Apache License 2.0
58 stars 25 forks source link

More database structure musings #134

Open aidanheerdegen opened 5 years ago

aidanheerdegen commented 5 years ago

Currently it is possible to decouple the experiment name from its location, which I think is potentially a good thing. name could be added to the metadata file, so an experiment name need not be tied to a particular choice of directory name. This would allow, say, spinup to be the name of the most recent spinup.

This is a good and bad thing. Convenient, but difficult for reproducibility. How would users know they are using the same dataset as the had in the past when specifying just a name.

Which brings me back to thinking about versioning, and uniquely identifying datasets.

Possible solution (or beginning of one): generate a uuid for each dataset and save it in the metadata file (create if it doesn't already exist).

This would solve another issue I have been thinking about: uniqueness. It should be possible to have the same experiment name for different models. Not only possible, but potentially desirable. That way every model can have a spinup experiment, say. This is possible if the uuid is the only column on which we force uniqueness.

It does make data discovery trickier. If you ask for all the experiments it may return 7 all named spinup. So I would advocate for adding a model column to the experiments table, and a corresponding field in the metadata file, unless there is a better way to extract model name from the outputs.

We can also easily handle versioning in this way. Multiple experiments can have the same name even for the same model, but unique ids. For this reason I would also advocate adding a version column to the experiments table. It could be specified in the metadata, or could require uniqueness for experiment+model+version and auto-increment version if there is a clash.

I know I said don't change this anymore, and maybe this could be done after the winter school, but I would strongly advocate for this, or something like this before publishing to a wider audience.

aidanheerdegen commented 4 years ago

Related https://github.com/COSIMA/cosima-cookbook/issues/168

aidanheerdegen commented 4 years ago

I'm not sure how useful a lot of these ideas are any longer, thought it was a good idea I had to add a model field in the DB, that is now inferred in the explorer, but probably should be a field in the DB. See https://github.com/COSIMA/cosima-cookbook/issues/182

angus-g commented 4 years ago

We could probably add model as a heuristic (hybrid?) property on the NCFile or NCVar models -- I'm not such a fan of storing it in the database itself, but we can easily derive it from data available on those models (you pull it out of the file's path, right?)