ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Merge configuration object from multiple files (instead of one single file) #2448

Closed schlunma closed 4 days ago

schlunma commented 4 months ago

Description

This PR is the first step towards our new configuration system (see https://github.com/ESMValGroup/ESMValCore/issues/2371). It allows reading the configuration from multiple files (and directories), similar to Dask's configuration. The different files are merged using dask.config.collect().

The following directories are considered (descending priority):

  1. The directory specified via the --config_dir command line argument
  2. User configuration directory (by default ~/.config/esmvaltool, but this can be changed with the ESMVALTOOL_CONFIG_DIR environment variable)
  3. Default configuration in repository (esmvalcore/config/configurations/defaults)

Please note that these directories may only contain those files that we used to call "user configuration files". If an old "developer configuration file" (or any other YAML file with a different format) is present in this directory, this will lead to errors about invalid configuration options.

Detailed instructions on how to update your existing setup can be found in the deprecation warnings that you get from running the tool, or in the section on deprecations below.

Related to #2371 Closes #805 Closes #1433

Link to documentation:

Deprecations (since v2.12.0, will be removed in v2.14.0)

Usage of single configuration file ~/.esmvaltool/config-user.yml

As mentioned above, ESMValTool will now read configuration directories (instead of one single file). By default, the directory ~/.config/esmvaltool will be considered (can be changed with command line argument --config_dir; see below). To switch to the new format, run

mkdir -p ~/.config/esmvaltool && mv ~/.esmvaltool/config-user.yml ~/.config/esmvaltool

Then you can run ESMValTool with

esmvaltool run <other optional command line options> <recipe>

Configuration option config_file / command line argument --config_file:

As mentioned above, ESMValTool will now read configuration directories (instead of one single file). Instead of specifying one file, specify a configuration directory with --config_dir. To switch to the new format, ensure that the directory that contains your old config file only contains this one YAML file (no other YAML files like old "developer" configuration files) and run ESMValTool with

esmvaltool run --config_dir=<path/to/your/old/config_dir> <other optional command line options> <recipe>

Config.load_from_file():

Please Config.load_from_dirs() instead, e.g.

from pathlib import Path
config_file = Path('path/to/my/config_file.yml')
CFG.load_from_dirs([config_file.parent])

Session.config_dir:

There is no replacement for this attribute.

Specify extra_facets_dir as tuple:

Please use a list instead. For example,

CFG['extra_facets_dir'] = ('path/1', 'path/2')  # deprecated, do not use anymore
CFG['extra_facets_dir'] = ['path/1', 'path/2']  # preferred syntax

Backwards-incompatible change

In some cases, simply copying the old configuration file (e.g., from ~/.esmvaltool/config-user-yml) to the new location might not be enough. In some cases, it might be necessary to adapt drs and rootpath settings. For example, on Levante, it might be necessary to change

drs:  # deprecated, do not use anymore
  CMIP3: DKRZ
  CMIP5: DKRZ
  CMIP6: DKRZ
  CORDEX: BADC

to

drs:  # preferred syntax
  CMIP3: DKRZ
  CMIP5: DKRZ
  CMIP6: DKRZ
  CORDEX: BADC
  obs4MIPS: default
Due to the new way settings are merged across configuration files, nested dictionaries are properly updated now. Example: The files ```yaml drs: # default config CMIP5: ESGF obs4MIPS: ESGF ``` and ```yaml drs: # user config CMIP5: DKRZ CMIP6: DKRZ ``` were previously merged to ```yaml drs: # final config CMIP5: DKRZ CMIP6: DKRZ ``` In the new syntax, they are merged to ```yaml drs: # final config CMIP5: DKRZ CMIP6: DKRZ obs4MIPS: ESGF ``` Thus, in the example above, it is necessary to explicitly specify `obs4MIPS: default`.

Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the ๐Ÿ›  Technical or ๐Ÿงช Scientific review.


To help with the number pull requests:

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.90%. Comparing base (4b0dd41) to head (82a1099). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2448 +/- ## ========================================== + Coverage 94.83% 94.90% +0.06% ========================================== Files 251 251 Lines 14191 14295 +104 ========================================== + Hits 13458 13566 +108 + Misses 733 729 -4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bouweandela commented 3 months ago

Would it be possible to do this in a backward compatible way? The deprecations are fine, but it would be nice if existing configurations kept working (at least for a while). For example, if ESMVALTOOL_CONFIG_DIR is not set and ~/.config/esmvaltool does not exist, we could just continue to read the values from ~/.esmvaltool/config-user.yml or the command line --config-file argument, right? Then people would only need to switch when they actually need the new functionality.

schlunma commented 3 months ago

Would it be possible to do this in a backward compatible way? The deprecations are fine, but it would be nice if existing configurations kept working (at least for a while). For example, if ESMVALTOOL_CONFIG_DIR is not set and ~/.config/esmvaltool does not exist, we could just continue to read the values from ~/.esmvaltool/config-user.yml or the command line --config-file argument, right? Then people would only need to switch when they actually need the new functionality.

Yes, 100% agree. That's why I already implemented that in exactly this way. This will work for two more minor versions, and only then it will raise an error.

EDIT: My solution is even stricter: if --config_file is given or ~/.esmvaltool/config-user.yml exists, the tool will always read those, completely ignoring the new settings.

schlunma commented 3 months ago

@bouweandela I think I addressed all your comments, could you have another look? Thanks!

bouweandela commented 2 months ago

@ESMValGroup/userengagementteam Could you please try this out and provide us with feedback on the user-friendliness?

k-a-webb commented 1 week ago

Continuing from the thread in https://github.com/ESMValGroup/ESMValCore/issues/2369#issuecomment-2374176485

I've only started to try and use PR #2448, but quickly encountered an issue related to user-defined projects. I have a custom configuration directory (config-user.yml, config-developer.yml):

esmvaltool_config:
  config-user.yml # slightly revised config-user.yml from ESMVal* v2.11.0
  config-developer.yml # contains custom projects, same as was used successfully in  ESMVal* v2.11.0

My issue is that the custom project raise an issue in esmvalcore.config._validated_config related to esmvalcore.config/_config_validators.validate_drs being set by the default (i.e., not user specified) esmvalcore/config/configurations/defaults/config-user.yml. Since this file includes only:

drs: {'CMIP3': 'ESGF', 'CMIP5': 'ESGF', 'CMIP6': 'ESGF', 'CORDEX': 'ESGF', 'obs4MIPs': 'ESGF'}

Running ESMValTool with the custom config files: esmvaltool run recipes/test__Seasonal_Near-surface_air_temperature.yml --config_dir /path/to/esmvaltool_config/ fail with the error message:

ERROR:esmvalcore._main:Program terminated abnormally, see stack trace below for more information:
Traceback (most recent call last):
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvalcore/ESMValCore_pr2448/esmvalcore/_main.py", line 405, in run
    CFG.load_from_dirs(config_dirs)
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvalcore/ESMValCore_pr2448/esmvalcore/config/_config_object.py", line 366, in load_from_dirs
    self.update(new_config_dict)
  File "<frozen _collections_abc>", line 982, in update
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvalcore/ESMValCore_pr2448/esmvalcore/config/_validated_config.py", line 65, in __setitem__
    raise InvalidConfigParameter(
esmvalcore.exceptions.InvalidConfigParameter: `CanESM5_dev` is not a valid config parameter.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvalcore/ESMValCore_pr2448/esmvalcore/_main.py", line 591, in run
    fire.Fire(ESMValTool())
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_pr2448/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_pr2448/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_pr2448/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/fs/site5/eccc/crd/ccrn/users/rkw001/code/esmvalcore/ESMValCore_pr2448/esmvalcore/_main.py", line 412, in run
    raise InvalidConfigParameter(
esmvalcore.exceptions.InvalidConfigParameter: Failed to parse configuration directory /home/rkw001/A4D_standard_diagnostics__v1.5_pr2448/work/v1.5pr_pr2448/esmvaltool_config (command line argument): `CanESM5_dev` is not a valid config parameter.

For completion: Working from esmvalcore branch nested_config commit fe299dc07fc4cf730036fe5538c33c1f080d7a7e

bouweandela commented 1 week ago

Thanks for testing this! That is very useful feedback.

schlunma commented 1 week ago

Thanks for the feedback @k-a-webb, that's great!

I think the issue here is that you put config-user.yml and config-developer.yml into the same directory. With the new way of reading configuration files, ESMValCore will read both of them and interpret both as "user config files". Since we still distinguish between "user" and "developer" configuration files, and these are very different, this fails. In the future, we'd like to get rid of this distinction, but for now, these need to be saved in separate directories.

I will add a note to the documentation to clarify this ๐Ÿ‘

schlunma commented 1 week ago

I will add a note to the documentation to clarify this ๐Ÿ‘

Done in https://github.com/ESMValGroup/ESMValCore/pull/2448/commits/3325b594d78a5b385f1c747323301c84a11ea884, please let me now if this is clear! Thanks!!

k-a-webb commented 1 week ago

I think the issue here is that you put config-user.yml and config-developer.yml into the same directory. With the new way of reading configuration files, ESMValCore will read both of them and interpret both as "user config files". Since we still distinguish between "user" and "developer" configuration files, and these are very different, this fails.

Thanks! I was able to successfully run the recipe once I moved the config-developer.yml to a different directory than set by --config_dir (and updating the path in config-user.yml).

I only noticed after-the-fact that this was stated at the top of this PR thread,

Please note that these directories may only contain those files that we used to call "user configuration files". If an old "developer configuration file" (or any other YAML file with a different format) is present in this directory, this will lead to errors about invalid configuration options.

I would suggest adding a warning to this section of the documentation, Developer configuration file, which states that the config-developer.yml file cannot be in the directory specified by --config_dir. I did not find anything in the documentation about the distinction between user/developer config files that you mentioned -- but perhaps I missed it.

I'm looking forward to the continued development of configurable custom configuration files! Thanks for all your hard work :)

schlunma commented 1 week ago

I added a warning in https://github.com/ESMValGroup/ESMValCore/pull/2448/commits/82a1099d45d6fee82928d0331d03e3d617581c23, see https://esmvaltool--2448.org.readthedocs.build/projects/ESMValCore/en/2448/quickstart/configure.html#developer-configuration-file.

schlunma commented 1 week ago

Thanks for your helpful feedback @k-a-webb ๐Ÿ‘ I think I addresses all your comments, please let me know if this is not the case!

If there are no further comments on this, I will merge this on Monday evening ๐Ÿš€