NOAA-GFDL / MDTF-diagnostics

Analysis framework and collection of process-oriented diagnostics for weather and climate simulations
https://mdtf-diagnostics.readthedocs.io/en/main/
Other
64 stars 100 forks source link

Unable to run multicase example #708

Closed csyhuang closed 1 week ago

csyhuang commented 3 weeks ago

I am trying to run the multicase example following 2.3-2.6 in the documentation page, but run into the error below.

There are several adjustments that I made in order to make the code run (till the point of error) - I think it would be good to include them in documentation such that new users can follow easier, so I listed them in the session Steps To Reproduce.

Please let me know if I missed anything. Thanks!


Bug Severity

Describe the bug

After making the path adjustments, I run the multicase example by the command

./mdtf -f diagnostics/example_multicase/multirun_config_template.jsonc 

And got the error AttributeError: 'DataArray' object has no attribute 'variables'. Did you mean: 'variable'?. The full output is as follows:

Preprocessing data for example_multicase
Querying /home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/diagnostics/example_multicase/esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.json for variable tas for case CMIP_Synthetic_r1i1p1f1_gr1_19800101-19841231.
WARNING: /home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/intake_esm/_search.py:50: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  mask = df[column].str.contains(value, regex=True, case=True, flags=0)

Variable <tas> data ends at hour 0
Querying /home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/diagnostics/example_multicase/esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.json for variable tas for case CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231.
WARNING: /home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/intake_esm/_search.py:50: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  mask = df[column].str.contains(value, regex=True, case=True, flags=0)

Variable <tas> data ends at hour 0
Units for 'time' on var 'tas' found in dataset; setting to 'days since 1980-01-01'.
No calendar for 'time' found in dataset; setting to 'sentinel.NotSet'.
Converted units on <#None:example_multicase.tas>.
WARNING: /home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/user_scripts/example_pp_script.py:98: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'time' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
  xr_dupe.coords['time'] = ind

Units for 'time' on var 'tas' found in dataset; setting to 'days since 1985-01-01'.
No calendar for 'time' found in dataset; setting to 'sentinel.NotSet'.
Converted units on <#None:example_multicase.tas>.
WARNING: /home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/user_scripts/example_pp_script.py:98: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'time' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
  xr_dupe.coords['time'] = ind

CRITICAL: **********************************************************************
Uncaught exception:
Traceback (most recent call last):
  File "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/src/preprocessor.py", line 1350, in write_ds
    ds = self.clean_output_attrs(var, ds)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/src/preprocessor.py", line 1281, in clean_output_attrs
    for vv in ds.variables.values():
              ^^^^^^^^^^^^
  File "/home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/xarray/core/common.py", line 280, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'variables'. Did you mean: 'variable'?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/mdtf_framework.py", line 243, in <module>
    exit_code = main(prog_name='MDTF-diagnostics')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clare/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/mdtf_framework.py", line 201, in main
    data_pp.write_ds(cases, cat_subset, pod_runtime_reqs)
  File "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/src/preprocessor.py", line 1353, in write_ds
    raise util.chain_exc(exc, (f"cleaning attributes to "
  File "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/src/util/exceptions.py", line 58, in chain_exc
    raise new_exc_class(new_msg) from exc
src.util.exceptions.DataPreprocessEvent: Caught exception while cleaning attributes to write data for <#None:example_multicase.tas>: AttributeError("'DataArray' object has no attribute 'variables'").

When I check the dataset using ncdump -h ~/Dropbox/GitHub/mdtf/inputdata/mdtf_test_data/CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231/day/CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231.tas.day.nc, I can see the time dimension being 1825, not 0.

Steps To Reproduce

In addition to following the instructions on the documentation Section 2.3-2.6, I also made the following changes to get the code running:

  1. I created the synthetic data as instructed in Section 2.3 and they are stored in /home/clare/Dropbox/GitHub/mdtf/inputdata/mdtf_test_data/.

  2. In user_scripts/example_pp_script.py, I changed the config_file to:

config_file = "/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/templates/runtime_config.jsonc"
  1. I changed the paths in diagnostics/example_multicase/esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv and diagnostics/example_multicase/esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv to point to the dataset on my machine.

  2. I changed in diagnostics/example_multicase/esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.json the "catalog_file" to point to the correct .csv path.

  3. I installed the most updated environment src/conda/env_python3_base.yml. Running mdtf afterwards gave me an error like ModuleNotFoundError: No module named 'cfunits', which I fixed by installing the missing packages:

python3 -m pip install cfunits
conda install conda-forge::udunits

These are the steps I've executed before running ./mdtf -f diagnostics/example_multicase/multirun_config_template.jsonc.

Environment

I am on the main branch with all the existing commits pulled.

Executing cat /etc/os-release gives

NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
     active environment : _MDTF_python3_base
    active env location : /home/clare/miniconda3/envs/_MDTF_python3_base
            shell level : 2
       user config file : /home/clare/.condarc
 populated config files : 
          conda version : 4.10.3
    conda-build version : not installed
         python version : 3.7.6.final.0
       virtual packages : __cuda=12.0=0
                          __linux=4.15.0=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/clare/miniconda3  (writable)
      conda av data dir : /home/clare/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/clare/miniconda3/pkgs
                          /home/clare/.conda/pkgs
       envs directories : /home/clare/miniconda3/envs
                          /home/clare/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.3 requests/2.28.1 CPython/3.7.6 Linux/4.15.0-214-generic ubuntu/18.04.6 glibc/2.27
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False
csyhuang commented 2 weeks ago

In addition, when I tried setting up MDTF environments using

./src/conda/conda_env_setup.sh --all --conda_root /home/clare/miniconda3

I encountered the following error:

info     libmamba ****************** Backtrace Start ******************
debug    libmamba Loading configuration
trace    libmamba Compute configurable 'create_base'
trace    libmamba Compute configurable 'no_env'
trace    libmamba Compute configurable 'no_rc'
trace    libmamba Compute configurable 'rc_files'
trace    libmamba Compute configurable 'root_prefix'
trace    libmamba Get RC files configuration from locations up to HomeDir
trace    libmamba Configuration not found at '/home/clare/.mambarc'
trace    libmamba Configuration not found at '/home/clare/.mamba/mambarc.d'
trace    libmamba Configuration not found at '/home/clare/.mamba/mambarc'
trace    libmamba Configuration not found at '/home/clare/.mamba/.mambarc'
trace    libmamba Configuration not found at '/home/clare/.config/mamba/mambarc.d'
trace    libmamba Configuration not found at '/home/clare/.config/mamba/mambarc'
trace    libmamba Configuration not found at '/home/clare/.config/mamba/.mambarc'
trace    libmamba Configuration not found at '/home/clare/.condarc'
trace    libmamba Configuration not found at '/home/clare/.conda/condarc.d'
trace    libmamba Configuration not found at '/home/clare/.conda/condarc'
trace    libmamba Configuration not found at '/home/clare/.conda/.condarc'
trace    libmamba Configuration not found at '/home/clare/.config/conda/condarc.d'
trace    libmamba Configuration not found at '/home/clare/.config/conda/condarc'
trace    libmamba Configuration not found at '/home/clare/.config/conda/.condarc'
trace    libmamba Configuration not found at '/home/clare/miniconda3/envs/_MDTF_install_temp/.mambarc'
trace    libmamba Configuration not found at '/home/clare/miniconda3/envs/_MDTF_install_temp/condarc.d'
trace    libmamba Configuration not found at '/home/clare/miniconda3/envs/_MDTF_install_temp/condarc'
trace    libmamba Configuration not found at '/home/clare/miniconda3/envs/_MDTF_install_temp/.condarc'
trace    libmamba Configuration not found at '/var/lib/conda/.mambarc'
trace    libmamba Configuration not found at '/var/lib/conda/condarc.d/'
trace    libmamba Configuration not found at '/var/lib/conda/condarc'
trace    libmamba Configuration not found at '/var/lib/conda/.condarc'
trace    libmamba Configuration not found at '/etc/conda/.mambarc'
trace    libmamba Configuration not found at '/etc/conda/condarc.d/'
trace    libmamba Configuration not found at '/etc/conda/condarc'
trace    libmamba Configuration not found at '/etc/conda/.condarc'
trace    libmamba Update configurable 'no_env'
trace    libmamba Compute configurable 'envs_dirs'
trace    libmamba Compute configurable 'file_specs'
error    libmamba YAML spec file '=/home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/src/conda/env_base_micromamba.yml' not found
critical libmamba File not found. Aborting.
info     libmamba ****************** Backtrace End ********************

I checked that the /home/clare/Dropbox/GitHub/mdtf/MDTF-diagnostics/src/conda/env_base_micromamba.yml exists, though.

wrongkindofdoctor commented 2 weeks ago

@csyhuang For the miniconda issue, you need to specify --env_dir in the call (e.g., --env_dir /home/clare/miniconda3/envs). If the install still fails, try installing one environment at a time with the -e parameter instead of --all (e.g., -e python3_base). I'll try out the example_multicase on my machine and see if I can replicate the issue.

wrongkindofdoctor commented 2 weeks ago

@csyhuang In the muitirun_config_template.jsonc, remove "example_pp_script.py" from the user_scripts list so that the line looks like "user_pp_scripts" : [],. The preprocessor automatically runs files in the list, and this won't work on the multicase example. I'll fix the template file in the example POD directory.

csyhuang commented 1 week ago

@wrongkindofdoctor Thanks Jess. After merging your new commits, I'm able to run the multicase example and view the output graph.

It would be helpful to indicate in the documentation which files contain path that user has to change (as mentioned in point 3, 4). It would be good to update the environment file to include the python package (point 5) too.


Regarding the installation error:

For the miniconda issue, you need to specify --env_dir in the call (e.g., --env_dir /home/clare/miniconda3/envs).

I thought it was not neecssary because from the documentation:

If the --env_dir flag is omitted, the environment files will be installed in your system’s conda’s default location (usually /envs).

And also, my error is not about the installation path, but that the environment file was not located properly. I installed the environment files one by one to get around this.


This issue is solved that I can proceed. Thanks and you can close this ticket.