ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
215 stars 126 forks source link

Latest cdo version does not work with mpi with mpich #1227

Closed earnone closed 5 years ago

earnone commented 5 years ago

recipe_quantilebias.yml (default recipe in the development branch) is broken after installing the latest esmvaltool environment. There is an apparent conflict of hdf5 libraries when calling the cdo remapcon command. This is the log: log_quantilebias_hdf5_error.txt

The only thing I can see that has changed since my previous environment is the hdf5 version: PREVIOUS:

hdf4                      4.2.13               h951d187_2    conda-forge/label/cf201901
hdf5                      1.10.3               hc401514_2    conda-forge/label/cf201901
hdfeos2                   2.20                 h7a90ae3_0    conda-forge/label/cf201901
hdfeos5                   5.1.16               h7423906_3    conda-forge/label/cf201901

LATEST:

hdf4                      4.2.13            h9a582f1_1002    conda-forge
hdf5                      1.10.5          nompi_h3c11f04_1100    conda-forge
hdfeos2                   2.20              h64bfcee_1000    conda-forge
hdfeos5                   5.1.16               h8b6279f_5    conda-forge

Anyone having similar troubles? Should we then specify the hdf5 version in the environment.yml?

earnone commented 5 years ago

@maritsandstad have you tested extreme_events with a recent environment?

valeriupredoi commented 5 years ago

I believe the problem to be either transitory or if it persists, it's got to do with something else. In any case, can you plss run the following:

            # cdo test, check that it supports hdf5
            cdo --version
            echo 0 | cdo -f nc input,r1x1 tmp.nc
            ncdump tmp.nc | ncgen -k hdf5 -o tmp.nc
            cdo -f nc copy tmp.nc tmp2.nc

and let me know what cdo version you have and if all steps go through? :beer:

valeriupredoi commented 5 years ago

OK, send me that anyway but I am noticing the same problem from the command line:

cdo remapcon tmp.nc tmp2.nc

so it's not a fluke (damn!). Will investigate...

valeriupredoi commented 5 years ago
cdo remapcon t2.nc out.nc 
cdo remapcon     : Enter grid description file or name > tx.nc
cdo remapcon: YAC first order conservative weights from lonlat (144x96) to gaussian (192x96) grid
cdo remapcon: Processed 25671168 values from 1 variable over 1857 timesteps [6.91s 33MB]

cdo remapcon t1.nc out.nc 
cdo remapcon     : Enter grid description file or name > t1.nc
cdo remapcon: YAC first order conservative weights from gaussian (192x96) to gaussian (192x96) grid
cdo remapcon: Processed 34504704 values from 1 variable over 1872 timesteps [5.09s 23MB]

cdo remapcon t2.nc out.nc 
cdo remapcon     : Enter grid description file or name > t1.nc
cdo remapcon: YAC first order conservative weights from lonlat (144x96) to gaussian (192x96) grid
cdo remapcon: Processed 25671168 values from 1 variable over 1857 timesteps [8.34s 32MB]

cdo remapcon t2.nc out.nc 
cdo remapcon     : Enter grid description file or name > t2.nc
Warning! ***HDF5 library version mismatched error***
KABOOM!

so it seems it really doesn't like remapping from lonlat() to lonlat()

Indeed all works fine in an environment with:

dependencies:
  - hdf5=1.10.4
  - cdo=1.9.6

and by forcing hdf5=1.10.5 in the environment cdo-1.9.7.1 will be installed and all works fine again; so it looks like the last version of hdf5 that cdo 1.9.6 was compiled against was 1.10.4; from there onwards cdo 1.9.7+ is compiled against hdf5=1.10.5

Again, a classic cockup from lovely conda :angry:

valeriupredoi commented 5 years ago

Confirm that cdo=1.9.7.1 + hdf5=1.10.5 works well; I will pin cdo in the environment file to >1.9.6 and recreate the environment

valeriupredoi commented 5 years ago

Bloody packages are misaligned again: requesting cdo>1.9.6:

Package mpi conflicts for:
esmpy -> esmf[version='7.1.0.*,7.1.0r.*,7.1.0r'] -> mpich[version='>=3.2,<3.3.0a0,>=3.2.1,<3.3.0a0'] -> mpi==1.0=mpich
esmvalcore[version='>=2.0.0b0,<2.1'] -> esmpy -> esmf[version='7.1.0.*,7.1.0r.*,7.1.0r'] -> mpich[version='>=3.2,<3.3.0a0,>=3.2.1,<3.3.0a0'] -> mpi==1.0=mpich
ncl[version='>=6.5.0'] -> esmf -> mpich[version='>=3.2,<3.3.0a0,>=3.2.1,<3.3.0a0'] -> mpi==1.0=mpich
nco -> esmf -> mpich[version='>=3.2,<3.3.0a0,>=3.2.1,<3.3.0a0'] -> mpi==1.0=mpich
cdo[version='>1.9.6'] -> fftw=[build=mpi_openmpi_*] -> openmpi[version='>=3.1.3,<3.2.0a0'] -> mpi==1.0=openmpi
imagemagick -> fftw=[build=mpi_mpich_*] -> openmpi[version='>=3.1.3,<3.2.0a0'] -> mpi==1.0=openmpi
valeriupredoi commented 5 years ago

OK so we are pretty much hosed: hdf5 needs to be at 1.10.5 since a lot of the other deps have evolved and compiled against it, however cdo=1.9.7+ is incompatible with a lot of the other deps (see above); setting its version to 1.9.6 (or not setting the version at all, since conda will default it to 1.9.6 anyway) will allow for environment creation but the remap function will not work. Bloody Catch 22

valeriupredoi commented 5 years ago

workaround export HDF5_DISABLE_VERSION_CHECK=1 will disable the internal cdo check for hdf5 versioning and cdo will actually perform the remap operation correctly (albeit spitting the warning message); we need to contact the cdo guys tell them to either completely disable the check from source (if indeed, as it seems, cdo=1.9.6 is compiled against both hdf5=1.10.4 and 1.10.5) and sort their new 1.9.7.1 version out

valeriupredoi commented 5 years ago

created an issue on the cdo's issue tracker forum: https://code.mpimet.mpg.de/boards/1/topics/7830

valeriupredoi commented 5 years ago

Summary of issue (changed original name to reflect the current state of the issue)

dependencies:

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Package mpi conflicts for: cdo=1.9.7.1 -> eccodes -> libnetcdf[version='>=4.6.2,<4.6.3.0a0,>=4.6.2,<4.7.0a0'] -> hdf5[version='>=1.10.4,<1.10.5.0a0,>=1.10.5,<1.10.6.0a0'] -> openmpi[version='>=3.1,<3.2.0a0'] -> mpi==1.0=openmpi hdf5=1.10.5 -> openmpi[version='>=3.1,<3.2.0a0'] -> mpi==1.0=openmpi mpich -> mpi==1.0=mpich


- no permanent solution found just yet
valeriupredoi commented 5 years ago

problem found: cdo=1.9.7.1 has a requirement on fftw which is hard set to openmpi in the package info/index.json. I asked for a relaxation of requirements at https://code.mpimet.mpg.de/boards/1/topics/7830?r=7848

valeriupredoi commented 5 years ago

closing this since it's got too much fluff in it, I opened a condesned issue summarizing the problem at #1239