ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
217 stars 126 forks source link

Conda environment issue when needing r-base and NCL #918

Closed mattiarighi closed 5 years ago

mattiarighi commented 5 years ago

NCL and CDO do not work anymore after the recent environment update.

NCL returns:

ncl: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

CDO returns:

cdo: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

As a workaround, I manually load the local modules for NCL and CDO instead of using the conda versions.

bjoernbroetz commented 5 years ago

With ldd `which ncl` you get all the shared libs of the ncl command In the current incarnation of the environment it shows: libcrypto.so.1.0.0 => not found

At DKRZ there is a more modern version of this library installed. If you want a q&d fix you can:

mkdir <MY_LIB_FOLDER> && cd <MY_LIB_FOLDER>
ln -s /lib64/libcrypt.so.1 libcrypto.so.1.0.0
export LD_LIBRARY_PATH=<MY_LIB_FOLDER>:$LD_LIBRARY_PATH

However, I would say the ncl conda package is broken for us.

mattiarighi commented 5 years ago

This is getting nasty: using the NCL from the local module strange errors are occurring. Apparently NCL cannot recognize color strings anymore (potentially affecting all diagnostic scripts!)

For example a simple script like this:

begin
  wks = gsn_open_wks("ps", "test")
  xx = ispan(1, 10, 1)
  yy = ispan(1, 20, 2)
  res = True
  res@gsnDraw = False
  res@gsnFrame = False
  plot = gsn_csm_xy(wks, xx, yy, res)
  resL = True
  resL@gsLineColor = "LightGray"
  dum = gsn_add_polyline(wks, plot, (/2, 5/), (/3, 6/), resL)
  draw(plot)
  frame(wks)
end

returns:

fatal:CvtStringToColorIndex: Unable to convert string "LightGray" to requested type
warning:Error retrieving resource gsLineColor from args - Ignoring Arg

The same script works on another machine with NCL 6.5.0. The problem occurs also when deactivating the esmvaltool environment. I also do not know whether the problem with conda and this error are related.

valeriupredoi commented 5 years ago

having a look at it right now, guys :radio:

DSenftleben commented 5 years ago

Hey guys, I have not updated the envrionment yet and for me everything still works. So, a caution to everyone: don't update your environment!

bjoernbroetz commented 5 years ago

@DSenftleben give us the output of: ldd `which ncl` | grep cryp

mattiarighi commented 5 years ago

I tested the simple script above on a colleague's account (no conda, no esmvaltool) and it works. I tried to deactivate conda and load just the NCL module, the problem still occurs.

valeriupredoi commented 5 years ago
(esmvaltool) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ ldd `which ncl` | grep cryp
    libcrypto.so.1.0.0 => /home/valeriu/anaconda3/envs/esmvaltool/bin/../lib/./libcrypto.so.1.0.0 (0x00007fa41fba5000)
    libk5crypto.so.3 => /home/valeriu/anaconda3/envs/esmvaltool/bin/../lib/./libk5crypto.so.3 (0x00007fa41fa5a000)
mattiarighi commented 5 years ago
ldd `which ncl` | grep cryp
        libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x00007fb352b5e000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fb34fba1000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fb34d7e9000)
bjoernbroetz commented 5 years ago

So @valeriupredoi, you have the needed shared lib in your environment that we miss.

bjoernbroetz commented 5 years ago

And @mattiarighi, you are using the one in the system.

mattiarighi commented 5 years ago

Yes, that works, but my test scripts crashes with a strange error message that does not occur for other users on the same machine.

bjoernbroetz commented 5 years ago

@mattiarighi : Is it the new released ncl: https://www.ncl.ucar.edu/current_release.shtml

mattiarighi commented 5 years ago

That might be the reason, indeed...

valeriupredoi commented 5 years ago

sorry guys, had to leave for a bit tell a colleague how to git.

So, even after a fresh pull of version2_development, environment update, reinstall the tool, I can use ncl ok:

ncl 0> x = 2
ncl 1> print(x)

Variable: x
Type: integer
Total Size: 4 bytes
            1 values
Number of Dimensions: 1
Dimensions and sizes:   [1]
Coordinates: 
(0) 2

(this is the only NCL syntax I know :grin: )

I have done this with two different OS's: Ubuntu and Scientific Linux

ldd output:

(esmvaltool) [valeriu@jasmin-sci2 esmvaltool_alpha]$ ldd `which ncl` | grep cryp
    libcrypto.so.1.1 => /home/users/valeriu/anaconda3Feb19/envs/esmvaltool/bin/../lib/./libcrypto.so.1.1 (0x00007fa8dfc06000)
    libk5crypto.so.3 => /home/users/valeriu/anaconda3Feb19/envs/esmvaltool/bin/../lib/./libk5crypto.so.3 (0x00007fa8dfabb000)

and for Ubuntu:

(esmvaltool) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ ldd `which ncl` | grep cryp
    libcrypto.so.1.1 => /home/valeriu/anaconda3/envs/esmvaltool/bin/../lib/./libcrypto.so.1.1 (0x00007f6f3ec76000)
    libk5crypto.so.3 => /home/valeriu/anaconda3/envs/esmvaltool/bin/../lib/./libk5crypto.so.3 (0x00007f6f3eb2c000)

and ncl version:

(esmvaltool) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ conda list ncl
# packages in environment at /home/valeriu/anaconda3/envs/esmvaltool:
#
# Name                    Version                   Build  Channel
ncl                       6.5.0           blas_openblashd40de8d_1  [blas_openblas]  conda-forge

and

(esmvaltool) [valeriu@jasmin-sci2 esmvaltool_alpha]$ conda list ncl
# packages in environment at /home/users/valeriu/anaconda3Feb19/envs/esmvaltool:
#
# Name                    Version                   Build  Channel
ncl                       6.5.0           blas_openblashd40de8d_1  [blas_openblas]  conda-forge
mattiarighi commented 5 years ago

Should we inform DKRZ about this problem? @bjoernbroetz what do you think?

valeriupredoi commented 5 years ago

note that conda env update WILL NOT pick up 6.6.2 https://anaconda.org/conda-forge/ncl but rather 6.5.0 still; I suspect the new version will be installed with a fresh env build (conda env create)

valeriupredoi commented 5 years ago

I am just building a new virtual env see if 6.6.2 gets installed and if I have issues with it, stay tuned :radio:

valeriupredoi commented 5 years ago

conda env create still picks up 6.5

# packages in environment at /home/valeriu/anaconda3/envs/esmvaltool_ncl:
#
# Name                    Version                   Build  Channel
ncl                       6.5.0           blas_openblash04324b8_3  [blas_openblas]  conda-forge

that works fine still; in the environment -

conda install -c conda-forge ncl

still picks up 6.5.0 but changes some hashes and updates some deps, but still works well

conda install -c conda-forge ncl=6.6.2

will remove a lot of needed stuff:

The following packages will be REMOVED:

  cdo-1.9.3-1
  r-base-3.5.1-he45234b_1005
  r-rcpp-1.0.0-r351h29659fb_1000
  r-xml-3.98_1.16-r351h96ca727_1000

and change a LOT of others, and at the end of it all:

ncl 0> x = 2
ncl 1> print(x)

Variable: x
Type: integer
Total Size: 4 bytes
            1 values
Number of Dimensions: 1
Dimensions and sizes:   [1]
Coordinates: 
(0) 2

so it still works (on Ubuntu, dont have time to test on Jasmin's SL)

(esmvaltool_ncl) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ ldd `which ncl` | grep cryp
    libcrypto.so.1.0.0 => /home/valeriu/anaconda3/envs/esmvaltool_ncl/bin/../lib/./libcrypto.so.1.0.0 (0x00007f57191e3000)
    libk5crypto.so.3 => /home/valeriu/anaconda3/envs/esmvaltool_ncl/bin/../lib/././libk5crypto.so.3 (0x00007f571584f000)

BUT the R tools are gone:

(esmvaltool_ncl) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ conda list r-rcpp
# packages in environment at /home/valeriu/anaconda3/envs/esmvaltool_ncl:
#
# Name                    Version                   Build  Channel
(esmvaltool_ncl) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ vim environment.yml 
(esmvaltool_ncl) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ conda list r-rxml
# packages in environment at /home/valeriu/anaconda3/envs/esmvaltool_ncl:
#
# Name                    Version                   Build  Channel

So I'd say let's stick with 6.5.0 for now

valeriupredoi commented 5 years ago

oh and cdo is gonzo as well

bjoernbroetz commented 5 years ago

Setting up the esmvaltool environment up new (conda env remove ... etc.) fixes the ncl issue. The cdo problem persists.

valeriupredoi commented 5 years ago

yes @bjoernbroetz - have a look at my workflow that I posted above, setting up a new env still installs ncl=6.5.0 which works fine; only problem is that if we want ncl=6.6.2 that by default removes the R packages and cdo; I have installed manually cdo after and at least cdo --help works

mattiarighi commented 5 years ago

Setting up the esmvaltool environment up new (conda env remove ... etc.) fixes the ncl issue. The cdo problem persists.

I did the same and I can confirm that NCL works but cdo still has problems.

As a workaround, we can use the local cdo (module load cdo) but it would be good to understand what is going wrong.

valeriupredoi commented 5 years ago

@mattiarighi what ncl and cdo versions do you have that are making things go tits up for you?

mattiarighi commented 5 years ago
cdo                       1.5.2                    pypi_0    pypi

NCL is 6.5.0 but now works.

valeriupredoi commented 5 years ago

your cdo is older than Moses @mattiarighi I have 1.9.3 with ncl=6.5 and 1.9.6 with ncl=6.6.2

mattiarighi commented 5 years ago

This is version I get running the standard installation procedure.

valeriupredoi commented 5 years ago

yes and that is ye olde - my installations are via conda-forge and that supersedes the pip one

mattiarighi commented 5 years ago

Can we change our setup accordingly?

valeriupredoi commented 5 years ago

frankly I am very surprised your cdo is from pip since my environments are stock esmvaltool environments and not anything custom, it's just that the pip one gets superseded by the conda one at some point during the environment creation and you should have the same conda=based cdo

valeriupredoi commented 5 years ago

aah! who set cdo=1.9.3 in environment.yml and didnt remove it from the pypi list in setup.py? That one will owe me a beer because: 1. now I kmow why the base ncl=6.5 even if the latest is 6.6.2 and 2. that may blow your stuffs up at your end @mattiarighi

mattiarighi commented 5 years ago

git blame is your friend :smile:

bouweandela commented 5 years ago

The cdo in setup.py is the Python cdo package (called python-cdo on conda-forge). This seems to confuse conda env export, but should be no problem. @mattiarighi can you run which cdo; cdo --version to get the actual version of cdo you have?

bouweandela commented 5 years ago

Note that we had to pin cdo to 1.9.5 in current pull requests to get it working with ncl 6.5 if you install both from conda.

valeriupredoi commented 5 years ago

@bouweandela you mean 1.9.3 working 1.9.3 with ncl=6.5.0: Climate Data Operators version 1.9.3 (http://mpimet.mpg.de/cdo) Compiled: by unknown on 7202286a92ec (x86_64-unknown-linux-gnu) Apr 18 2018 20:54:15 CXX Compiler: g++ -fPIC -DPIC -g -O2 -std=c++11 -fopenmp -fPIC -DPIC -fopenmp CXX version : g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15) C Compiler: gcc -std=gnu99 -fPIC -DPIC -fopenmp C version : gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15) F77 Compiler:
F77 version : ./configure: line 21772: -V: command not found Features: 7GB C++11 DATA PTHREADS OpenMP3 HDF5 NC4/HDF5/threadsafe OPeNDAP UDUNITS2 PROJ.4 XML2 CURL FFTW3 SSE2 Libraries: HDF5/1.10.1(1.10.2) proj/4.93 xml2/2.9.8 curl/7.64.0(h7.59.0) Filetypes: srv ext ieg grb1 grb2 nc1 nc2 nc4 nc4c nc5 CDI library version : 1.9.3 of Apr 18 2018 20:52:33 CGRIBEX library version : 1.9.0 of Jan 22 2018 09:24:03 GRIB_API library version : 2.8.2 NetCDF library version : 4.6.1 of Oct 22 2018 00:13:28 $ HDF5 library version : 1.10.2 threadsafe EXSE library version : 1.4.0 of Apr 18 2018 20:52:29 FILE library version : 1.8.3 of Apr 18 2018 20:52:27

working 1.9.6 with ncl=6.6.2: (esmvaltool_ncl) valeriu@valeriu-PORTEGE-Z30-C:~/esmvaltool_alpha$ cdo --version Climate Data Operators version 1.9.6 (http://mpimet.mpg.de/cdo) System: x86_64-pc-linux-gnu CXX Compiler: /home/conda/feedstock_root/build_artifacts/cdo_1550155197086/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++ -fPIC -DPIC -g -O2 -std=c++11 -fopenmp -fPIC -DPIC -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/valeriu/anaconda3/envs/esmvaltool_ncl/include -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=${PREFIX}=/usr/local/src/conda-prefix -fopenmp CXX version : unknown C Compiler: /home/conda/feedstock_root/build_artifacts/cdo_1550155197086/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc -fPIC -DPIC -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/valeriu/anaconda3/envs/esmvaltool_ncl/include -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=${PREFIX}=/usr/local/src/conda-prefix -fopenmp
C version : unknown F77 Compiler: /home/conda/feedstock_root/build_artifacts/cdo_1550155197086/_build_env/bin/x86_64-conda_cos6-linux-gnu-gfortran -fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/valeriu/anaconda3/envs/esmvaltool_ncl/include -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=${PREFIX}=/usr/local/src/conda-prefix F77 version : unknown Features: 7GB 4threads C++11 Fortran DATA PTHREADS OpenMP45 HDF5 NC4/HDF5/threadsafe OPeNDAP UDUNITS2 PROJ.4 XML2 CURL FFTW3 SSE3 Libraries: HDF5/1.10.4 proj/5.2 xml2/2.9.8 curl/7.64.0 Filetypes: srv ext ieg grb1 grb2 nc1 nc2 nc4 nc4c nc5 CDI library version : 1.9.6 cgribex library version : 1.9.2 ecCodes library version : 2.12.0 NetCDF library version : 4.6.2 of Dec 17 2018 19:36:03 $ hdf5 library version : 1.10.4 threadsafe exse library version : 1.4.1 FILE library version : 1.8.3

mattiarighi commented 5 years ago

which cdo returns the conda path

/miniconda3/envs/esmvaltool/bin/cdo

cdo --version does not work due to the problem above

valeriupredoi commented 5 years ago

@mattiarighi is your error

cdo: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

?

mattiarighi commented 5 years ago

Yes

bouweandela commented 5 years ago

@bouweandela you mean 1.9.3

No, I mean 1.9.5

valeriupredoi commented 5 years ago

(cdo+ncl from conda): 1.9.3 works fine with 6.5 on certain machines as well as 1.9.3 or 5 doesn't work at all with 6.5 on other machines; it's a machine-specific bug - am looking at it right now

bouweandela commented 5 years ago

cdo --version does not work due to the problem above

Haha, they're not making it easy for you. Use LD_LIBRARY_PATH=/some/path/to/libssl cdo --version instead

valeriupredoi commented 5 years ago

ok @mattiarighi - create a new environment with this environment.yml:

---

name: esmvaltool
channels:
  - conda-forge

dependencies:
  # Python packages that cannot be installed from PyPI:
  - iris
  - matplotlib<3  # Can be installed from PyPI, but is a dependency of iris and should be pinned.
  - python-stratify
  - esmpy
  - xarray  # Can be installed from PyPI, but here to get a consistent set of depencies with iris.
  # Non-Python dependencies
  - graphviz
  - cdo=1.9.3

  # Multi language support:
  - ncl=6.5.0
  - jasper!=1.900.31  # pinned NCL dependency
  - r-base
  - r-rcpp
  - r-xml
  - libunwind  #  specifically for Python3.7+
  # TODO: add julia

and remove cdo from setup.py - that will get you back on tracks :train2:

mattiarighi commented 5 years ago

Can we implement this as general solution?

valeriupredoi commented 5 years ago

@bouweandela @mattiarighi the problem here is that both cdo and ncl have evolved and they have evolved in different directions as conda sees it: ie if we pin cdo<1.9.6 and let ncl free the environment is hard to solve since conda will sometimes try grab ncl=6.6.2 that doesn't work with cdo=1.9.3 or 5 (interestingly enough on a machine with older conda 4.6.4 this is not a problem because the older conda will not know of the new ncl=6.6.2); if we install cdo=1.9.6 that will remove ncl=6.5.0 and the other way around -- it looks like there is a lot of hatred between the older ncl and cdo=1.9.6; if we pin cdo=1.9.6 and ncl=6.6.2 then it will complain about r-base/r-xml/r-shits that get slashed with the new ncl (all the R libraries get slashed!) - so yes, I think the best intermediary solution is to pin both cdo and ncl to their older versions unless we can install the R packages from pip (havent tried that yet)

valeriupredoi commented 5 years ago

note that this environment works well as well (from an esmvaltool/ncl/cdo functional point of view but the r-shits are slashed and need to be installed from elsewhere but conda):

---

name: esmvaltool
channels:
  - conda-forge

dependencies:
  # Python packages that cannot be installed from PyPI:
  - iris
  - matplotlib<3  # Can be installed from PyPI, but is a dependency of iris and should be pinned.
  - python-stratify
  - esmpy
  - xarray  # Can be installed from PyPI, but here to get a consistent set of depencies with iris.
  # Non-Python dependencies
  - graphviz
  - cdo=1.9.6

  # Multi language support:
  - ncl=6.6.2
  - jasper!=1.900.31  # pinned NCL dependency
  #- r-base
  #- r-rcpp
  #- r-xml
  - libunwind  #  specifically for Python3.7+
  # TODO: add julia
valeriupredoi commented 5 years ago

ah but I have good news for you python-dependency-lovers :grin: - the R shits can be installed no problemo from the r channel eg conda install -c r r-base and so on, no deps will be removed and we'll have the latest ncl and cdo versions talking to each other, so we just need to add the r channel to our environment.yml - :beer: to me

valeriupredoi commented 5 years ago

also no need to pin ncl or cdo since if conda gets those R shits from the r channel then there will be no need to stay at ncl=6.5 hence ncl will go up and cdo will go up with it too; am going home, have had enough of this :bus:

valeriupredoi commented 5 years ago
(esmvaltool_cdo2) [valeriu@jasmin-sci2 esmvaltool_alpha]$ conda list cdo
# packages in environment at /home/users/valeriu/anaconda3Feb19/envs/esmvaltool_cdo2:
#
# Name                    Version                   Build  Channel
cdo                       1.9.6             he2b4288_1005    conda-forge
(esmvaltool_cdo2) [valeriu@jasmin-sci2 esmvaltool_alpha]$ conda list ncl
# packages in environment at /home/users/valeriu/anaconda3Feb19/envs/esmvaltool_cdo2:
#
# Name                    Version                   Build  Channel
ncl                       6.6.2           blas_openblashde02c1e_0  [blas_openblas]  conda-forge
(esmvaltool_cdo2) [valeriu@jasmin-sci2 esmvaltool_alpha]$ conda list r-base
# packages in environment at /home/users/valeriu/anaconda3Feb19/envs/esmvaltool_cdo2:
#
# Name                    Version                   Build  Channel
r-base                    3.2.2                         0    r
(esmvaltool_cdo2) [valeriu@jasmin-sci2 esmvaltool_alpha]$ conda list r-rcpp
# packages in environment at /home/users/valeriu/anaconda3Feb19/envs/esmvaltool_cdo2:
#
# Name                    Version                   Build  Channel
r-rcpp                    0.12.2                r3.2.2_0a    r
(esmvaltool_cdo2) [valeriu@jasmin-sci2 esmvaltool_alpha]$ conda list r-xml
# packages in environment at /home/users/valeriu/anaconda3Feb19/envs/esmvaltool_cdo2:
#
# Name                    Version                   Build  Channel
r-xml                     3.98_1.3              r3.2.2_0a    r
(esmvaltool_cdo2) [valeriu@jasmin-sci2 esmvaltool_alpha]$ cdo --version
Climate Data Operators version 1.9.6 (http://mpimet.mpg.de/cdo)
ruthlorenz commented 5 years ago

We just tried to install the newest environment @ETH and got the message ncl=6.5.0 and hdf5=1.10.3 are conflicting. Solved it by removing the hdf5 package definition and removing r-base (since we are not using R). But other people might run into the same issue.

mattiarighi commented 5 years ago

Please try the environment in #936. That works at DKRZ and on Jasmin. We will merge it hopefully today.

ruthlorenz commented 5 years ago

thanks, ok, just saw that there is more.....