Closed mattiarighi closed 5 years ago
Hey, if you're interested in reading the observations into Iris cubes now, rather than reformatting on disk you could use CIS. It was designed to do exactly this and has a simple plugin system for fixing all the things like metadata and coordinate systems in these datasets, see e.g. http://cis.readthedocs.io/en/stable/plugin_development.html. It can already read AERONET, MODIS and the aircraft datasets (HIPPO, SALTRACE, CONCERT etc) in your list out-of-the box.
I'd be very happy to chat about how this could fit in to ESMValTool.
I've started a branch for this issue version2_reformat_obs
.
For now I've just created a directory /esmvaltool/cmor/cmorize_obs
, where I moved one of the existing reformat scripts (cmorize_obs_AURA-TES.ncl
) and some related functions. I've also created a dedicated recipe recipe_cmorize_obs.yml
which simply calls this script as it happens with the diagnostic.
This approach has some problems, since for example interface information (such as the input path to the obs data to be cmorized specified in the config file) is not passed to the script.
I think the suggestion by @bouweandela to create a specific executable for such task would be much better, also in view of the future implementation of the CIS plugin for cmorizing observations (see #513).
One option could be:
cmorize_obs [dataset]
Another option is to use the same executable, but using subcommands (argparse support it):
esmvaltool cmorize_obs ${DATASET}
This way we can extend esmvaltool functionality while maintaining a single entry point
esmvaltool run recipe.yml
esmvaltool available_recipes
That's also fine. I think we should avoid having an extra recipe for cmorizing obs as in v1, it's a bit overkill.
I've created a simple tool prepare_observations
in branch version2_prepare_observations
which uses the cis
package to read a list of files and store the variables found in those files in an output directory, one file per variable. It can be installed by running the usual pip install -e .
command. Make sure to first update your conda environment, because the cis
package installed with pip is outdated and apparently broken. This tool allows us to use cis
for reformatting observational data. The reformat scripts could then be implemented as cis
plugins.
@mattiarighi Can you check how useful this is and if it works for you/matches your expectations? I did not have any observational data to test it with, so there are probably things about it that do not work (I tested with model data). Maybe you can test it with one of the observational datasets that is already supported by cis
? And next try to port one of the esmvaltool 1 reformat scripts to a cis
plugin, as described here and see if this is a good experience? The prepare_observations
tool is very simple now, I expect we may need some extra features, but it would be nice if we could keep it to just a thin wrapper around cis
, if we find that it suits our needs.
Thanks @bouweandela - this is looking really good.
I've now updated CIS on PyPi to the latest version (1.6.0), I tend not to update it as regularly as I recomend people install using conda, but I should get into the habit of keeping it up to date!
Thanks @duncanwp for updating CIS on PyPi. I tried installing it, with 'pip install' into my ESMVal environment. I get the following error message:
Collecting iris>=1.8.0 (from cis==1.6.0)
Could not find a version that satisfies the requirement iris>=1.8.0 (from cis==1.6.0) (from versions: 1.0.4)
No matching distribution found for iris>=1.8.0 (from cis==1.6.0)
I have already installed Iris into my existing environment:
conda list
# ...
# iris 2.1.0 py27_3 conda-forge
# ...
So I don't really get what this error message means. Thanks for any help !
Unfortunately the 'iris' package on PyPi is not the correct package (it should be scitools-iris). I'll need to update the CIS setup.py and re-release, but this may not happen until the next major release (in the next couple of months).
In the meantime, if you already have iris installed, you can install with the 'no-dependencies' flag.
Apologies for the inconvenience!
I keep getting requests from users on how to reformat observational data in version 2.
I think we should add the possibility to run the v1 reformat scripts in v2 (as suggested above).
The method based on CIS implemented by @bouweandela (see above) actually works, but we do not have the resources now to translate our set of reformat scripts into CIS plug-ins (as also discussed above).
Moving to CIS on the long term would be great and we should definitely do that, but at this stage we urgently need to give the users the possibility to generate CMORized observations with the existing scripts and to contribute new ones. The easiest and fastest solution would be to use the same framework of v1, which is very flexible and has multi-language support (as for the diagnostics).
I can take care of porting all existing reformat scripts to v2, also fixing metadata issues which Iris2 is raising. I just need someone to help with the framework: the only thing these scripts need is to have access to interface information, such as input/output paths and logging functions.
hey guys @mattiarighi @bouweandela @jvegasbsc - here's my take on this (started work on this branch, no PR yet, changes are too messy for a PR just yet: https://github.com/ESMValGroup/ESMValTool/tree/version2_reformat_obs_workflow) - the workflow for obs reformatting:
_recipe.py
when time comes to build the input_files
dict add an option from config-user
called apply_reformat
- if True, get the input files dictionary through a reformat.py
script that will, in turn, run the reformatting if the dataset is in the reformat library; then return it in the same shape, only this time around files will have changed location and are reformatted;reformat_scripts
(I guess we want only the obs ones right?) and dump the output in a safe location that the code can check if they already exist, the reformatting+saving is ignored;Now, does anybody remember how the run_executable()
function was operating in v1 so we can run all those ncl, csh etc codes? I did port that crazy function last year in the first ever v2 but beats me if I remember.
How's this sound?
btw this is a massive hack and we will have to pythonize the reformat scripts sometime in the near future :grin:
It sounds quite complicated, do we need to go through the esmvaltool workflow for that? As I said above, the only thing the reformat_obs need are the config-user information about input and output paths and (maybe) the logging functions. They run as stand-alone scripts mostly independent of the rest of the tool.
Pythonizing all scripts is planned in the long term in the CIS framework, what we need now is a quick solution to allow porting the almost 90 existing scripts from v1. I also would like to keep the multi-language support, since we can expect users writing their diags in NCL/R also wanting to reformat their obs using the same language.
ok - new tentative workflow - talked to @mattiarighi - see https://github.com/ESMValGroup/ESMValTool/pull/666 damn, hope the PR number doesn't do any harm :laughing:
Thanks to the Amazing @valeriupredoi :tm:, we now have a cmorizer for the observations in version 2 :applause:
It can be executed by:
cmorize_obs -c config-user.yml -o [DATASET1,DATASET2,...]
For each of the given datasets, it will look for the data to be cmorized in [RAWOBS]/Tier[1-2]/DATASET
, where [RAWOBS]
is a path given in config-user.yml
, and apply the corresponding cmorizer script in esmvaltool/utils/cmorizers/obs/
(NCL or Python). The cmorized output will be saved in [output_dir]/cmorize_obs_YYYYMMDD_DDMMSS/Tier[1-2]/DATASET
.
At present, two cmorizers are included, AURA-TES
and ESACCI-LANDCOVER
, more will follow (priority will be given to the dataset used by the recipes already ported to v2).
for Python cmorizer scripts one needs to build a cmorization(in_dir, out_dir)
function that takes input and output dirs as args; we can extend that when we start working on Python cmorizer scripts; cheers @mattiarighi for making me a Trade Mark :grin:
Proposed workflow for porting the cmorization script from v1:
These are the observations used in the recipes currently available in version2_development
.
Once the cmorizers for these dataset are available, this issue can be closed:
Tier 2
Tier 3
lwpStderr
@mattiarighi that list looks so pretty :grin: It would be worth running your new cmorization checking recipe with iris 2 via #832 given you found that BNU-ESM inconsistency that is in fact a problem in iris
Good point!
To reformat observational data in the CMOR format used by the tool, a number of dataset-specific reformat scripts is available (
reformat_scripts/obs/
).In v1.0, they were executed by running the
namelist_reformat_obs
namelist. This has to be ported to the new framework.