CDAT / cdms

8 stars 10 forks source link

cdms2 + 'LSCE' openmpi (+ ESMF ?) side effect in 2.10 #164

Open jypeter opened 7 years ago

jypeter commented 7 years ago

@dnadeau4 @doutriaux1 I have just found the following, when working with another user

Everything works fine be default, when I import cdms2 (and openmpi is not loaded)

bash-4.2$ source activate uvcdat-2.10
(uvcdat-2.10) bash-4.2$ env | grep -i openmpi
# NO openmpi in my environment
(uvcdat-2.10) bash-4.2$ python
Python 2.7.13 | packaged by conda-forge | (default, May  2 2017, 12:48:11) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdms2
>>> 

Now, if I load openmpi to access our homemade LSCE openmpi, I get an error when I import cdms2, with references to ESMF. Note that by homemade I just mean that it has been recompiled locally

(uvcdat-2.10) bash-4.2$ module load openmpi
(uvcdat-2.10) bash-4.2$ python
Python 2.7.13 | packaged by conda-forge | (default, May  2 2017, 12:48:11) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdms2
Traceback (most recent call last):
  File "/home/share/unix_files/cdat/miniconda2/envs/uvcdat-2.10/lib/python2.7/site-packages/ESMF/interface/loadESMF.py", line 122, in <module>
    mode=ct.RTLD_GLOBAL)
  File "/home/share/unix_files/cdat/miniconda2/envs/uvcdat-2.10/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/share/unix_files/cdat/miniconda2/envs/uvcdat-2.10/lib/./libmpicxx.so.12: undefined symbol: MPIR_Keyval_set_proxy
>>> 

I have the following references to openmpi in my environment (references added by module load openmpi)

(uvcdat-2.10) bash-4.2$ env | grep -i openmpi
MANPATH=/usr/local/install/openmpi-1.10.5/share/man:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/man/en_US:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/man/en_US:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/man/en_US:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/man/en_US:/usr/local/share/man:/usr/share/man/overrides:/usr/share/man::/usr/local/install/apache/man:
LIBRARY_PATH=/usr/local/install/openmpi-1.10.5/lib:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/lib/intel64
LD_LIBRARY_PATH=/usr/local/install/openmpi-1.10.5/lib:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mpirt/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mpirt/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/compiler/lib/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/lib/intel64
CPATH=/usr/local/install/openmpi-1.10.5/include:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/include:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/include
PATH=/usr/local/install/openmpi-1.10.5/bin:/home/share/unix_files/cdat/miniconda2/envs/uvcdat-2.10/bin:/home/share/unix_files/cdat/miniconda2/bin:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/bin/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mpirt/bin/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/bin/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/bin/intel64_mic:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/debugger/gui/intel64:/usr/local/install/ImageMagick/bin:/home/share/unix_files/ferret/atlas:/home/share/unix_files/ferret/fast:/usr/local/install/ferret-6.9.6/bin:/usr/lib64/qt-3.3/bin:/home/users/jypeter/bin:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/bin/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mpirt/bin/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/bin/intel64:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/bin/intel64_mic:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/debugger/gui/intel64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:.
_LMFILES_=/usr/local/install/modulefiles/ferret/6.9.6:/usr/local/install/modulefiles/openmpi/1.10.5
LOADEDMODULES=ferret/6.9.6:openmpi/1.10.5
INCLUDE=/usr/local/install/openmpi-1.10.5/include:/usr/local/install/intel-fcomp-2013/composer_xe_2013.2.146/mkl/include

I'm not sure if you can do anything about that, but I thought I'd better report this problem

For now, I'm just going to remove the module load openmpi from this user's environment, because I'm not sure she needs it

Should openmpi provided by conda be a dependency of something? There does not seem to be any in 2.10, but it seems to be available in conda-forge

(uvcdat-2.10) bash-4.2$ conda list mpi
# packages in environment at /home/share/unix_files/cdat/miniconda2/envs/uvcdat-2.10:
#
mpich                     3.2                           4    conda-forge
(uvcdat-2.10) bash-4.2$ conda search -c conda-forge openmpi
Fetching package metadata ...........
openmpi                      2.0.2                         0  conda-forge     
                             2.1.1                         0  conda-forge   

I have also cloned 2.10 and added openmpi with conda, but I still get the same error after doing a module load openmpi

bash-4.2$ source activate cdatmpi
(cdatmpi) bash-4.2$ conda list mpi
# packages in environment at /home/share/unix_files/cdat/miniconda2/envs/cdatmpi:
#
mpich                     3.2                           4    conda-forge
openmpi                   2.1.1                         0    conda-forge
(cdatmpi) bash-4.2$ python
Python 2.7.13 | packaged by conda-forge | (default, May  2 2017, 12:48:11) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdms2
>>> 
(cdatmpi) bash-4.2$ module load openmpi
(cdatmpi) bash-4.2$ python
Python 2.7.13 | packaged by conda-forge | (default, May  2 2017, 12:48:11) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdms2
Traceback (most recent call last):
  File "/home/share/unix_files/cdat/miniconda2/envs/cdatmpi/lib/python2.7/site-packages/ESMF/interface/loadESMF.py", line 122, in <module>
    mode=ct.RTLD_GLOBAL)
  File "/home/share/unix_files/cdat/miniconda2/envs/cdatmpi/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/share/unix_files/cdat/miniconda2/envs/cdatmpi/lib/./libmpicxx.so.12: undefined symbol: MPIR_Keyval_set_proxy
>>> 
doutriaux1 commented 7 years ago

@jypeter there's some magic code in cdms to take advantage of cdms if it's present. Did you instal mpi4py in this env? @dnadeau4 we might want to try to catch this and fsll back on non mpi. second of all, cdms/hdf5/etc . are not compiled against mpi so that's probably why it fails lastly i think esmf is compiled against mpich not openmpi so loading openmpi will probably lead to so issues.

jypeter commented 7 years ago

You mean take advantage of mpi?

The first environment was the stock uvcdat-2.10 you provide (apparently with mpich), and the second one was the same, with the added openmpi from conda-forge

Like I said, the 2.10 you provide works fine. It only fails if we add our locally compiled openmpi. Of course it would be best not to get this side effect

I also tell the new users to activate cdat only in the terminals where they need it (and not do it by default), in order to minimize side effects due to the cdat environment. Better be safe!

doutriaux1 commented 7 years ago

@jypeter ok that makes sense, like I said esmf is compiled against mpich so adding openmpi to your env will create issues. I will try to reproduce it

dnadeau4 commented 6 years ago

actually it is mpi4py that does the magic with python. The problem is the ESMF does not use openmpi, but mpich and they need to compile ESMF with openmpi using "variants"

I will try to reproduce this and see if I can fix cdms, but no garantee.

dnadeau4 commented 6 years ago

@jypeter this seems to be fixed in the latest cdms2.

cdms2                     2.12.2018.02.14.00.33.g712da0e.npy1.13          py27_0    uvcdat/label/unstable
conda list | grep esm 
esmf                      7.0.0                         8    conda-forge
esmpy                     7.0.0                    py27_1    conda-forge
 conda list | grep mpi
mpich                     3.2                           5    conda-forge
openmpi                   3.0.0                         0    conda-forge
Python 2.7.14 | packaged by conda-forge | (default, Dec 25 2017, 01:16:05) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdms2
>>>
dnadeau4 commented 6 years ago

@jypeter can you close if this works for you. Thanks