COSIMA / master_index

Scripts to generate COSIMA master index
0 stars 0 forks source link

Better association of experiments with data #18

Open aidanheerdegen opened 3 years ago

aidanheerdegen commented 3 years ago

Currently there is no good way to find the experimental configuration and run directory for a dataset that is present in the cookbook.

This is important for a number of reasons, investigating model configuration, documentation, but also in case you want to spin a new experiment off from the old one.

My favoured solution: add a url field to the metadata to specify a git(hub) repository for the experiment control repo. Then strongly encourage (force) everyone who has data in the main DB to push their config to GitHub and add the URL to their metadata.

aekiss commented 3 years ago

Sounds like a good idea. One complication is that there are multiple commits in each run... but having any of them in the metadata would be a lot better than none.

FYI there's a bit in sync_data.sh that clones the run's git history to the sync location, in an attempt to address this issue https://github.com/COSIMA/1deg_jra55_iaf/blob/master/sync_data.sh#L148-L152 though not everyone uses this script.

aidanheerdegen commented 2 years ago

I like the idea of using the presence of a metadata.yaml file to signify the root directory of an experiment. This makes it simpler to build scripts to walk directory trees looking for experiments, and means if you want your data indexed you need to have a metadata.yaml file.

Once you make the metadata.yaml file compulsory then you can start checking for fields like url. You could even check the link is valid.