ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
217 stars 127 forks source link

Where does preprocessing of CMIP5 model input data that is used in a diagnostic fit in? #1033

Closed bascrezee closed 4 years ago

bascrezee commented 5 years ago

A diagnostic makes use of some land surface data (from CLM) that is used as input for several CMIP models. This land surface data needs some preprocessing and is later read by the diagnostic. In a way, it could be treated as OBS data, but this might be confusing since it clearly is a different category. Also, it does not fit into the category of a derived variable. Any thoughts where this might fit in?

bascrezee commented 5 years ago

@mattiarighi Any idea's? Or can you tag someone who can help? The raw data is available after registration. However, as mentioned, it does not fit in the OBS category. It is CMIP input data.

mattiarighi commented 5 years ago

The solution could be to use the auxiliary_data_dir in config-user.yml, which is designed for such auxiliary data. @ledm used that in one of his recipes, if I remember correctly.

valeriupredoi commented 5 years ago

if you need that data only during the diagnostic phase then it can be an input to the diagnostic settings see for example https://github.com/ESMValGroup/ESMValTool/blob/version2_development/esmvaltool/recipes/recipe_autoassess_landsurface_snow.yml where climfiles_root: points to the root where various specialty masks are extracted from for use during the diag. If preprocessing is needed then you can probably do it during the diag before using them in the diag per se. The auxiliary_data_dir is also a solution but I don't think that the data there can go through the full set of preprocessor steps - @ledm correct me if I iz wrong, man :beer:

ledm commented 5 years ago

When I used the auxiliary_data_dir field, it was to pass datafiles to Cartopy, so that cartopy could access the land/sea interface datasets and to add the continents to our figures. We didn't want this auxiliary data to be loaded by ESMValTool,

It sounds like @bascrezee wants to add initial conditions datasets to the recipe and treat them in a similar way to CMIP data (ie, preprocess them and so on). The solution seems to be to add another category of data, something like OBS data, but call it a external_model_data, initial_conditions or similar. Does that sound right to you?

As a work around for now, you can probably just treat it like Obs data as far as ESMValTool is concerned.

Note that if you want ESMValTool to process this data, it will have to be CF compliant.

bascrezee commented 5 years ago

It sounds like @bascrezee wants to add initial conditions datasets to the recipe and treat them in a similar way to CMIP data (ie, preprocess them and so on). The solution seems to be to add another category of data, something like OBS data, but call it a external_model_data, initial_conditions or similar. Does that sound right to you?

Yes, exactly, that is the case.

As a work around for now, you can probably just treat it like Obs data as far as ESMValTool is concerned.

Yes, I will go for that.

mattiarighi commented 5 years ago

I would avoid defining another category of data if we can use auxiliary_data for the same purpose, just to keep it simple.

@bouweandela what do you think?

bascrezee commented 5 years ago

I would avoid defining another category of data if we can use auxiliary_data for the same purpose, just to keep it simple.

Fine with me, as long as I can pass all the data in auxiliary_data through the preprocessor as defined in the recipe. I can, right?

mattiarighi commented 5 years ago

It should be possible, I remember we discussed it as a possible application of auxiliary_data_dir but there is no example yet.

bouweandela commented 5 years ago

The auxiliary data dir is meant for extra files needed by the recipe that do not need processing. For example, we have a recipe that uses shapefiles to cut out a particular area. The shapefiles can be stored in the auxiliary data dir and the path in the recipe is relative to that directory, see recipe_shapeselect.yml.

If the data needs preprocessing just like model or observational datasets, it would make more sense to me to add it either as OBS data (even if that's not strictly correct, but it's only one case) or define an additional category (if there are many such cases).

mattiarighi commented 4 years ago

Reopen if necessary.