NOAA-GFDL / MDTF-diagnostics

Analysis framework and collection of process-oriented diagnostics for weather and climate simulations
https://mdtf-diagnostics.readthedocs.io/en/main/
Other
64 stars 100 forks source link

Error in reading in data from a data catalog #714

Open nishsilva opened 1 day ago

nishsilva commented 1 day ago

Bug Severity

Describe the bug I am trying to read in a locally available model data in my POD. For this I have created a data catalog using the catalog builder tool in the MDTF. When I try to run the POD it result an error. I am including the terminal output in the log information section.

Steps To Reproduce

  1. Created a data catalog for a CM4 Model data file (variable zos) - zos_Omon_GFDL-CM4_historical_r1i1p1f1_gn_199001-200912.nc
  2. Created a trial POD names sl_reginoal https://github.com/nishsilva/MDTF-diagnostics/tree/nish-pod/diagnostics/sl_regional
  3. The runtime configuration file is https://github.com/nishsilva/MDTF-diagnostics/blob/nish-pod/diagnostics/sl_regional/runtime_config_NE_CM4_gfdl.yml

Environment Describe the system environment:

Log information and/or terminal output

(miniconda3) netige@crhtc50:/glade/work/netige/mdtf_Nov24/mdtf/MDTF-diagnostics> ./mdtf -f ./diagnostics/sl_regional/runtime_config_NE_CM4_gfdl.yml 
Preprocessing data for sl_regional
Querying /glade/work/netige/mdtf_Nov24/data_catalogs/CM4_zos.json for variable zos for case CM4_zos.
WARNING: /glade/u/home/netige/miniconda3/envs/_MDTF_base/lib/python3.12/site-packages/intake_esm/_search.py:50: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  mask = df[column].str.contains(value, regex=True, case=True, flags=0)

CRITICAL: **********************************************************************
Uncaught exception:
Traceback (most recent call last):
  File "/glade/work/netige/mdtf_Nov24/mdtf/MDTF-diagnostics/mdtf_framework.py", line 243, in <module>
    exit_code = main(prog_name='MDTF-diagnostics')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/netige/miniconda3/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/netige/miniconda3/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/glade/u/home/netige/miniconda3/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/netige/miniconda3/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/netige/miniconda3/envs/_MDTF_base/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/netige/mdtf_Nov24/mdtf/MDTF-diagnostics/mdtf_framework.py", line 199, in main
    cat_subset = data_pp.process(cases, ctx.config, model_paths.MODEL_WORK_DIR)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/netige/mdtf_Nov24/mdtf/MDTF-diagnostics/src/preprocessor.py", line 1405, in process
    cat_subset = self.query_catalog(case_list, config.DATA_CATALOG)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/netige/mdtf_Nov24/mdtf/MDTF-diagnostics/src/preprocessor.py", line 1036, in query_catalog
    raise util.DataRequestError(
src.util.exceptions.DataRequestError: check_group_daterange returned empty data frame for zos case CM4_zos in /glade/work/netige/mdtf_Nov24/data_catalogs/CM4_zos.json, indicating issues with data continuity
jtmims commented 18 hours ago

Hi @nishsilva! Thank you for opening this issue.

I saw that currently the date range set for the case is set to 1995-1996. The MDTF currently can not strip a range out of a larger xarray file (this is functionality that we hope to add soon. You can run with the file provided by changing the runtime config file with the following:

"CM4_zos" :
    model: "CM4"
    convention: "CMIP"
    startdate: "19900116120000"
    enddate: "20091216120000"

As for now, it appears the ocean grid definitions for lat and lon aren't gelling well with the MDTF. For now, I would recommend working with a dataset locally in the convention that you plan the POD to be in (CMIP, CESM, or GFDL) after the preprocessor. If you do that, you can set run_pp: False. This means that the MDTF will not do any translation, but it will pass along the catalog file and you can start working on the POD in the MDTF environment. I will attach the fix for lat and lon to this thread once we get that fixed up!

nishsilva commented 15 hours ago

Hi @jtmims, thank you for this.

I did not know that MDTF not being able to strip a range out of a larger xarray file. Once config file changed to the full time period as you suggested and switched run_pp to False the POD read in the data from the .nc file. This was very helpful!!!

Thanks so much!

jtmims commented 15 hours ago

No problem @nishsilva! I'm glad you are making progress. I will keep this thread open for now in case any further issues arise. I will also update you when the lat and lon problem is solve so we can try with the preprocessor again!

nishsilva commented 15 hours ago

Sounds great @jtmims !!!