ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
217 stars 127 forks source link

CMORize tool fails for RAWOBS if directory structure does not include Tier2/Tier3 #3640

Open k-a-webb opened 4 months ago

k-a-webb commented 4 months ago

When attempting to CMORize RAWOBS located in a directory which does not follow ESMValTool's convention (i.e., no Tiers) the cmorizer.py script fails at line 266

tier = self._get_dataset_tier(dataset)
if tier is None:
    logger.error("Data for %s not found. Perhaps you are not"
                 " storing it in a RAWOBS/TierX/%s"
                  " (X=2 or 3) directory structure?", dataset, dataset)
    return False

Ideally one could use the CMORizer tool with the same configuration files and directory structure that you use to run ESMValTool.


Example configuration files:

config-user.yml:

    log_level: debug
    exit_on_warning: false
    output_file_type: svg
    output_dir: ./output
    auxiliary_data_dir: ./auxiliary_data
    compress_netcdf: false
    save_intermediary_cubes: false
    remove_preproc_dir: false
    max_parallel_tasks: null
    config_developer_file: ./config-developer.yml
    profile_diagnostic: false

    rootpath:
      RAWOBS: /space/hall6/sitestore/eccc/crd/ccrn/users/scrd113/RAWOBS
      OBS6: /space/hall6/sitestore/eccc/crd/ccrn/users/scrd113/CMOROBS

    drs:
      RAWOBS: default
      OBS6: default

config-developer.yml

    OBS6:
      cmor_strict: false
      input_dir:
        evt_default: 'Tier{tier}/{dataset}'
        default: '{type}/{dataset}/{latestversion}/{frequency}/{short_name}'
      input_file:
        evt_default: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc'
        default: 'OBS6_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc'
      output_file: 'OBS6_{dataset}_{type}_{version}_{mip}_{short_name}'
      cmor_type: 'CMIP6'

    RAWOBS:
      cmor_strict: false
      input_dir:
        default: '{type}/{dataset}/{latestversion}/{frequency}/{short_name}'
        evt_default: 'Tier{tier}/{dataset}'
      input_file:
        default: '*nc'
      output_file: 'OBS_{dataset}_{type}_{version}_{mip}_{name}' # "name" is specified in esmvaltool/cmorizers/data/cmor_config/WOA.yml
      cmor_type: 'CMIP5'

Attempting to CMORize WOA located in directory structure:

/space/hall6/sitestore/eccc/crd/ccrn/users/scrd113/RAWOBS/clim/WOA

with files such as:

/space/hall6/sitestore/eccc/crd/ccrn/users/scrd113/RAWOBS/clim/WOA/v2018/mon/temperature/woa18_decav81B0_t00_01.nc

esmvaltool data format --config_file config-user.yml WOA

Fails with the error message:

2024-05-31 23:00:14,355 UTC [86961] INFO    Writing program log files to:
/fs/homeu2/eccc/crd/ccrn_shr/rkw001/A4D_standard_diagnostics__cmor/work/cmor_woa/output/data_formatting_20240531_230012/run/main_log.txt
/fs/homeu2/eccc/crd/ccrn_shr/rkw001/A4D_standard_diagnostics__cmor/work/cmor_woa/output/data_formatting_20240531_230012/run/main_log_debug.txt
2024-05-31 23:00:14,356 UTC [86961] INFO    Starting the CMORization Tool at time: 2024-05-31 23:00:14 UTC
2024-05-31 23:00:14,356 UTC [86961] INFO    ----------------------------------------------------------------------
2024-05-31 23:00:14,356 UTC [86961] INFO    input_dir  = /home/rkw001/download_data/RAWOBS
2024-05-31 23:00:14,356 UTC [86961] INFO    output_dir = /fs/homeu2/eccc/crd/ccrn_shr/rkw001/A4D_standard_diagnostics__cmor/work/cmor_woa/output/data_formatting_20240531_230012
2024-05-31 23:00:14,356 UTC [86961] INFO    ----------------------------------------------------------------------
2024-05-31 23:00:14,356 UTC [86961] INFO    Running the CMORization scripts.
2024-05-31 23:00:14,356 UTC [86961] INFO    Processing datasets ['WOA']
2024-05-31 23:00:14,356 UTC [86961] INFO    Input data from: /home/rkw001/download_data/RAWOBS/Tier2/WOA
2024-05-31 23:00:14,356 UTC [86961] INFO    Output will be written to: /fs/homeu2/eccc/crd/ccrn_shr/rkw001/A4D_standard_diagnostics__cmor/work/cmor_woa/output/data_formatting_20240531_230012/Tier2/WOA
2024-05-31 23:00:14,357 UTC [86961] INFO    Reformat script: /fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/formatters/datasets/woa
2024-05-31 23:00:14,358 UTC [86961] INFO    CMORizing dataset WOA using Python script /fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/formatters/datasets/woa.py
2024-05-31 23:00:14,361 UTC [86961] INFO    CMORizing var thetao from input set temperature
2024-05-31 23:00:14,379 UTC [86961] ERROR   Program terminated abnormally, see stack trace below for more information:
Traceback (most recent call last):
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvalcore/esmvalcore_for_A4D_standard_diagnostics/esmvalcore/_main.py", line 499, in run
    fire.Fire(ESMValTool())
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/cmorizer.py", line 489, in format
    self.formatter.format(start, end, install)
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/cmorizer.py", line 194, in format
    if not self.format_dataset(dataset, start, end, install):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/cmorizer.py", line 284, in format_dataset
    success = self._run_pyt_script(in_data_dir, out_data_dir, dataset,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/cmorizer.py", line 386, in _run_pyt_script
    module.cmorization(in_dir, out_dir, cmor_cfg, self.config, start, end)
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/formatters/datasets/woa.py", line 148, in cmorization
    extract_variable(in_files, out_dir, glob_attrs, raw_info, cmor_table)
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvaltool/esmvaltool_for_A4D_standard_diagnostics/esmvaltool/cmorizers/data/formatters/datasets/woa.py", line 100, in extract_variable
    cubes = iris.load(in_files, rawvar)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/iris/__init__.py", line 326, in load
    return _load_collection(uris, constraints, callback).merged().cubes()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/iris/__init__.py", line 294, in _load_collection
    result = _CubeFilterCollection.from_cubes(cubes, constraints)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/iris/cube.py", line 97, in from_cubes
    for cube in cubes:
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/iris/__init__.py", line 275, in _generate_cubes
    for cube in iris.io.load_files(part_names, callback, constraints):
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/iris/io/__init__.py", line 206, in load_files
    all_file_paths = expand_filespecs(filenames)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_v1p3/lib/python3.11/site-packages/iris/io/__init__.py", line 184, in expand_filespecs
    raise IOError(msg)
OSError: One or more of the files specified did not exist:
    * "/home/rkw001/download_data/RAWOBS/Tier2/WOA/temperature/woa18_decav81B0_t00_01.nc" didn't match any files

Note that CMORization scripts provided with ESMValTool were unchanged.

k-a-webb commented 4 months ago

@malininae