ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
217 stars 126 forks source link

Duplicate keys in config-developer.yml #250

Closed bouweandela closed 6 years ago

bouweandela commented 6 years ago

Output of yamllint esmvaltool/config-developer.yml:

89:5      error    duplication of key "CFSv2-2011" in mapping  (key-duplicates)
97:5      error    duplication of key "HadGEM2-ES" in mapping  (key-duplicates)
bouweandela commented 6 years ago

The above keys are entered twice and with different values in the model to institute mapping, @mattiarighi can you have a look? Maybe it is a good idea to order these lists alphabetically, to make this kind of mistake easier to spot.

mattiarighi commented 6 years ago

I'm aware of this problem, but this is not an error in the yml file, but the actual way the CMIP5 data are structured for some specific drs (like DKRZ).

For example, the HadGEM2-ES model is listed twice under two different institutes (MOHC and INPE) and the variables are split between the two corresponding subdirectories.

One option could be to extend the file finder to accept lists as value for the model key (something like HadGEM2-ES: [MOHC, INPE]).

bouweandela commented 6 years ago

It is an error in the yaml file, because now the entry that is read second will overwrite the first one so it will never be used. I do not think the current implementation works. The proposed solution looks fine to me.

valeriupredoi commented 6 years ago

yo @bouweandela @mattiarighi looks like I fixed this in here https://github.com/ESMValGroup/ESMValTool/commit/35f4562e8f20b84cbb6c844f670d50df03cbd65d can you guys have a looksee and let me know pls? I don't have access to DKRZ so I had to build a toy model of the paths to replicate and fix the issue

mattiarighi commented 6 years ago

I've tried the CFSv2-2011, which is one of the models which is listed under two institutes in config-developer.yml, i.e. ['COLA-CFS', 'NOAA-NCEP'].

This is the namelist entry:

- {model: CFSv2-2011,   project: CMIP5,  mip: Amon,  exp: decadal1990,  ensemble: r1i1p1,  start_year: 1991,  end_year: 1992}

I would expect _data_finder.py to search in these two directories:

COLA-CFS/CFSv2-2011/decadal1990/mon/atmos/Amon/r1i1p1/
NOAA-NCEP/CFSv2-2011/decadal1990/mon/atmos/Amon/r1i1p1/

and return the second, since the first does not exist. But I get an error message that it cannot find the first one.

valeriupredoi commented 6 years ago

Oh shoot, I think I forgot to add handling if a file is not found, could you send me the error pls?

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Fri, 8 Jun 2018, 07:57 Mattia Righi, notifications@github.com wrote:

I've tried the CFSv2-2011, which is one of the models which is listed under two institutes in config-developer.yml, i.e. ['COLA-CFS', 'NOAA-NCEP'].

This is the namelist entry:

  • {model: CFSv2-2011, project: CMIP5, mip: Amon, exp: decadal1990, ensemble: r1i1p1, start_year: 1991, end_year: 1992}

I would expect _data_finder.py to search in these two directories:

COLA-CFS/CFSv2-2011/decadal1990/mon/atmos/Amon/r1i1p1/ NOAA-NCEP/CFSv2-2011/decadal1990/mon/atmos/Amon/r1i1p1/

and return the second, since the first does not exist. But I get an error message that it cannot find the first one.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValTool/issues/250#issuecomment-395667418, or mute the thread https://github.com/notifications/unsubscribe-auth/AbpCo1m8cZD3gFNNPKTcSC21Uz3C6cKzks5t6iBngaJpZM4SYU_8 .

mattiarighi commented 6 years ago
Traceback (most recent call last):
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_main.py", line 186, in run
    main(args)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_main.py", line 125, in main
    process_namelist(namelist_file=namelist_file, config_user=cfg)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_main.py", line 161, in process_namelist
    namelist = read_namelist_file(namelist_file, config_user)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 36, in read_namelist_file
    raw_namelist, config_user, initialize_tasks, namelist_file=filename)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 627, in __init__
    self.tasks = self.initialize_tasks() if initialize_tasks else None
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 809, in initialize_tasks
    write_ncl_interface=self._support_ncl)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 603, in _get_preprocessor_task
    variables, profile, config_user, ancestors=derive_tasks)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 521, in _get_single_preprocessor_task
    filename for variable in variables
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 522, in <listcomp>
    for filename in _get_input_files(variable, config_user)
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 402, in _get_input_files
    drs=config_user['drs'])
  File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_data_finder.py", line 204, in get_input_filelist
    list_versions = os.listdir(part1)
FileNotFoundError: [Errno 2] No such file or directory: 'COLA-CFS/CFSv2-2011/decadal1990/mon/atmos/Amon/r1i1p1/'
valeriupredoi commented 6 years ago

right, cheers, crap -- I was under the assumption that the rootdir right before [latestversion] would exist for both institutions, only that the variables would be distributed across institutions; I'll put a check/release for the root dir1 then.

On Fri, Jun 8, 2018 at 8:17 AM, Mattia Righi notifications@github.com wrote:

Traceback (most recent call last): File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_main.py", line 186, in run main(args) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_main.py", line 125, in main process_namelist(namelist_file=namelist_file, config_user=cfg) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_main.py", line 161, in process_namelist namelist = read_namelist_file(namelist_file, config_user) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 36, in read_namelist_file raw_namelist, config_user, initialize_tasks, namelist_file=filename) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 627, in init self.tasks = self.initialize_tasks() if initialize_tasks else None File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 809, in initialize_tasks write_ncl_interface=self._support_ncl) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 603, in _get_preprocessor_task variables, profile, config_user, ancestors=derive_tasks) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 521, in _get_single_preprocessor_task filename for variable in variables File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 522, in for filename in _get_input_files(variable, config_user) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_namelist.py", line 402, in _get_input_files drs=config_user['drs']) File "miniconda3/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0.0-py3.6.egg/esmvaltool/_data_finder.py", line 204, in get_input_filelist list_versions = os.listdir(part1) FileNotFoundError: [Errno 2] No such file or directory: 'COLA-CFS/CFSv2-2011/decadal1990/mon/atmos/Amon/r1i1p1/'

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValTool/issues/250#issuecomment-395671607, or mute the thread https://github.com/notifications/unsubscribe-auth/AbpCo1oc6PbHUj8W2kMmZTrXA6VolSx6ks5t6iTzgaJpZM4SYU_8 .

-- Dr. Valeriu Predoi Computational Scientist for UKESM Core Team Department of Meteorology, University of Reading Earley Gate, Office 1U08 READING, RG6 6BB United Kingdom Mobile number: 07847416092

"If one day you be questioning your ability to come up with professional results, think of this: Noah's ark was built by farmers whereas the Titanic was crafted by skilled engineers"

mattiarighi commented 6 years ago

In the (possible?) case that the exp/ensemble/variable combination exists in both institutes, latestversion should be the criteria for selection. But I think that never happens.

valeriupredoi commented 6 years ago

Cool, cheers for the clarification, it's easy-peasy - a check on the existence of dir1 and then a cgeck on the members of the filelist, off to work now wnd will put these in

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Fri, 8 Jun 2018, 08:51 Mattia Righi, notifications@github.com wrote:

In the (possible?) case that the exp/ensemble/variable combination exists in both institutes, latestversion should be the criteria for selection. But I think that never happens.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValTool/issues/250#issuecomment-395679464, or mute the thread https://github.com/notifications/unsubscribe-auth/AbpCo-s5xDjsly8SjovrwOvgn9K6egVRks5t6izngaJpZM4SYU_8 .

valeriupredoi commented 6 years ago

hey @mattiarighi this should fix your woes https://github.com/ESMValGroup/ESMValTool/commit/044d7c4673be4c515fb67a42474dea4c905aa181 give it a shot when you got time and let me know pls, if all goes well, this branch is ready for merging in the backend since Ive just merged backend into it yesterday before I started changing _data_finder

valeriupredoi commented 6 years ago

@mattiarighi did you have a chance to test this bit? :grin:

mattiarighi commented 6 years ago

Testing now.

mattiarighi commented 6 years ago

It works as expected. Thanks!

Go ahead with the PR and let @bouweandela check it.

valeriupredoi commented 6 years ago

terrific! cheers @mattiarighi