ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Possibility to provide a '*' to the definition of a dataset in a recipe #589

Closed jservonnat closed 1 year ago

jservonnat commented 4 years ago

Hello everyone, this issue follows our discussion during the Is-ENES3 GA with @valeriupredoi With CliMAF we found very interesting to be able to specify a wildcard '' to our dataset definitions, like for instance model='', realization='' to work on all the models or realizations available. In the same way, we implemented the possibility to specify period='last_XXY', 'first_XXY' or '', with XX being a number of years, to retrieve the last XX, first XX years available, or the full period. Do you guys think you could consider adding this functionality? Cheers, J.

valeriupredoi commented 4 years ago

nice @jservonnat :beer: I shall have a look and start implementing this idea: my take would be:

What do you guys reckon @mattiarighi @bouweandela @jvegasbsc :beer:

valeriupredoi commented 4 years ago

PS - @jservonnat I moved your issue to ESMValCore since this deals with the data finder and logistics within the Core

valeriupredoi commented 4 years ago

any suggestions/approvals/nay's @bouweandela @jvegasbsc @mattiarighi ? I am planning on starting work on this :beer:

jvegreg commented 4 years ago
  • wildcard for datasets; we can define an except option;

This can be useful, but I fear that the except will grow quite a bit

  • wildcard for experiments used with the option exp: ie user asks for an experiment but if data is unavailable for that experiment the code can choose from others (similar to the current exp: [list] but less restrictive);

This is more tricky, usually you don't have interchangeable experiments: you have ensemble members for that. Unless you are thinking on things like make equivalent CMIP6's historical and HighResMIP's highressst-present, but in this case using two different lines to define CMIP6 and HighResMIP datasets is

  • wildcard for ensemble (CMIP6);

This is becoming mandatory, there are datasets in CMIP6 with lots of members and its becoming a problem to keep track of all of them

  • wildcard for years - all available years - but this one is tricky since we'll have to harmonize time boundaries for stuff like eg multimodel so we don't have to analyze a whole lot of time and discard it at zonal/meridional/multimodel stats

This is mandatory to ease definitions for DCPP and similar decadal and seasonal experiments. It will be a pain to specify all DCPP startdates if we have to provide the exact data range for each one.

bouweandela commented 4 years ago

I agree that it would be a very nice feature to be able to use glob patterns in the variable/dataset definitions in the recipe. It will be some work to implement this though.

We will also need to think a bit about how we want to make recipes in the ESMValtool repository reproducible if we use this feature. At the moment @mattiarighi tests if a recipe works with the variables and datasets that are part of it, but if this starts to depend on the data available, it becomes a bit harder to test that stuff actually works, so maybe we would not want to allow this for those recipes.