ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
210 stars 122 forks source link

Longer run time of recipes with 3D regridding in ESMValTool v2.10.0 compared to v2.5.0 #3590

Closed k-a-webb closed 1 month ago

k-a-webb commented 1 month ago

For a single CanESM5 dataset of 30 years, it takes ~4min wall time to run a simple recipe in ESMValTool v2.5.0, but >1h in v2.10.0 and v2.11 (main branch). There is a significant increase in run time in the regridding step.

The test recipes involves the following preprocessors (and no diagnostic scripts):

preprocessors:
  time_ocean_zonal_mean:
    custom_order: true
    climate_statistics:
      operator: mean
      period: full
    extract_levels:
      levels: [ 0,  10, 20, 50, 100, 200, 300, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750]
      scheme: linear_extrapolate
      coordinate: depth
    regrid:
      target_grid: 1x1
      scheme: nearest
    zonal_statistics:
      operator: mean

In this case, the regridding is done via esmpy_regrid (alias of _regrid_esmpy.regrid) as it passes _attempt_irregular_regridding check)

It takes longer to both build the target grid (regridder = build_regridder(src_rep, dst_rep, method)), and regrid the data (res = map_slices(src, regridder, src_rep, dst_rep)).

main_log_debug.py files for the various runs of the same recipe in different environments, each with a different installation of ESMValTool: main_log_debug-esmvaltool.txt main_log_debug--ESMValToolv2.5.0.txt main_log_debug-EVTmaindev.txt (same as main_log_debug-EVTmain.txt, as expected)

(Note: ESMValTool install main branch fails with missing author message -- despite inclusion of authors.)


Installation details:

ESMValTool v2.10.0 was installed via

mamba create --name esmvaltool -c conda-forge esmvaltool
conda activate esmvaltool
ESMValCore: 2.10.0
ESMValTool: 2.10.0

ESMValTool main branch installed via

mamba create --name a4d_env_EVTmain

conda activate a4d_env_EVTmain

cd ~/code/esmvaltool/
git clone https://github.com/ESMValGroup/ESMValTool.git -b main  ESMValTool_main
cd ESMValTool_main
mamba env update --file environment.yml -n a4d_env_EVTmain
> esmvaltool version
Running esmvaltool executable from ESMValCore. No other command line utilities are available until ESMValTool is installed.
ESMValCore: 2.10.0

as well as the development version,


mamba create --name a4d_env_EVTmaindev

conda activate a4d_env_EVTmaindev

cd ~/code/esmvaltool/
cd ESMValTool_main
mamba env update --file environment.yml -n a4d_env_EVTmaindev
pip install --editable '.[develop]'

cd ~/code/esmvalcore/
git clone https://github.com/ESMValGroup/ESMValCore.git -b main  ESMValCore_main
cd ESMValCore_main
mamba env update --file environment.yml -n a4d_env_EVTmain
pip install --editable '.[develop]'
> esmvaltool version
/space/hall5/sitestore/eccc/crd/ccrn/users/rkw001/miniconda3/envs/a4d_env_EVTmaindev/lib/python3.11/site-packages/pyproj/__init__.py:89: UserWarning: pyproj unable to set database path.
  _pyproj_global_context_initialize()
ESMValCore: 2.11.0.dev100+ga782af8e3.d20240510
ESMValTool: 2.11.0.dev72+gcb582bd01.d20240510

Note: To install ESMValTool v2.5.0 the following modifications to the install instructions was required:

mamba create --name a4d_env_EVTv2.5r python==3.9.7
conda activate a4d_env_EVTv2.5r

cd ~/code/esmvaltool/
git clone https://github.com/ESMValGroup/ESMValTool.git -b v2.5.0  ESMValTool_v2.5.0
cd ESMValTool_v2.5.0
nano environment.yml # esmpy==8.2.0, esmvalcore==2.5.0
mamba env update --file environment.yml -n a4d_env_EVTv2.5r
ESMValCore: 2.5.0
ESMValTool: 2.5.0

Following the basic instructions for installing ESMValTool without the above modifications lead to package version issues with both shapely and esmpy/ESMF


Environment files (conda list > environment_<env>.yml) files are also attached. environment__EVTmain.txt environment__esmvaltool.txt environment__EVTv2.5r.txt environment__EVTmaindev.txt

k-a-webb commented 1 month ago

@malininae

bouweandela commented 1 month ago

Thanks for reporting the issue! This happens because the climate_statistics and extract_levels preprocessor functions are now lazy, but the ESMPy based regridding preprocessor is not. Therefore it was loading the data from disk and recomputing the input to the regridding multiple times. It should be fixed by https://github.com/ESMValGroup/ESMValCore/pull/2418.

(Note: ESMValTool install main branch fails with missing author message -- despite inclusion of authors.)

It looks like you did not install ESMValTool, but only created the conda environment with its dependencies. If you run pip install -e . in the directory where you checked out ESMValTool it should work as expected.

k-a-webb commented 1 month ago

Excellent! Thanks for sorting this out :D