ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
223 stars 128 forks source link

Reading datasets with multiple institutes #582

Closed schlunma closed 6 years ago

schlunma commented 6 years ago

While testing #580 I encountered another bug:

I have the following dataset in my recipe:

datasets:
  ...
  - {dataset: HadGEM2-ES, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 1986, end_year: 2004}
  ...

For this model, multiple institutes are available in in config-developer.yml:

institute:
  ...
  'HadGEM2-ES': ['INPE', 'MOHC']
  ...

The problem is: On the cluster, data is available for the members r1i1p1 - r4i1p1 in the MOHC directory and only r5i1p1 in the INPE directory. Even though the correct data is available, the tool fails with the following error because there is no r1i1p1 directory in INPE:

2018-08-21 14:23:01,303 UTC [46582] ERROR   Program terminated abnormally, see stack trace below for more information
Traceback (most recent call last):
  ...
  File "~/ESMValTool/esmvaltool/_recipe.py", line 668, in <listcomp>
    for filename in _get_input_files(variable, config_user)
  File "~/ESMValTool/esmvaltool/_recipe.py", line 525, in _get_input_files
    drs=config_user['drs'])
  File "~/ESMValTool/esmvaltool/_data_finder.py", line 389, in get_input_filelist
    raise IOError('Path {} does not exist'.format(part1))
OSError: Path /data/cmip5/output1/INPE/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/ does not exist

If I use the r5i1p1 ensemble, the same problem occurs in the MOHC directory.

In the case of multiple institutes, the tool should check all institutes, and only fail when no data is available in any of them.

valeriupredoi commented 6 years ago

@schlunma are you sure that the data is correct (ie what the diag needs) given that the ensembles differ?

schlunma commented 6 years ago

Yes, I am sure. I need the member r1i1p1 of historical tas of HadGEM2-ES. The data structure looks like this:

/data/cmip5/output1/MOHC/HadGEM2-ES/historical/mon/atmos/Amon
.
├── r1i1p1
├── r2i1p1
├── r3i1p1
└── r4i1p1

and

/data/cmip5/output1/INPE/HadGEM2-ES/historical/mon/atmos/Amon
.
└── r5i1p1

In this case, I would expect the tool to find my data in the MOHC directory. However, it does not because it also searches in the second directory (INPE) where clearly no data is available and fails.

valeriupredoi commented 6 years ago

I am still puzzelated by this issue: if r1i1p1 data is needed and INPE doesn't have it, then it should automatically grab it from MOHC if it has it and shut up, if not, it will unambiguously crap out since data is unavailable in both places - it is very specific, once you tell it you need ensemble r1i1p1 then it will look only for that -- I have just run with HadGEM2-ES and there is no problem for me, I made sure the code won't fail if it doesn't find the data in one place but then it finds it in the next institution -- what variable is this you using, I need to first replicate your issue

valeriupredoi commented 6 years ago

oh wait...I think I know where the issue is...

valeriupredoi commented 6 years ago

I think this should be solved by this https://github.com/ESMValGroup/ESMValTool/pull/580/commits/308872c41f07bf86d6b13442bd3ce0ee16607bd0 but unfortunately I can not test in a real case since BADC has not even a bit of INPE data

schlunma commented 6 years ago

Fixed by #580.