conda-forge / openmpi-feedstock

A conda-smithy repository for openmpi.
BSD 3-Clause "New" or "Revised" License
10 stars 25 forks source link

Missing dependency when openmpi and mpi4py installed through conda #185

Open mrmundt opened 1 day ago

mrmundt commented 1 day ago

Solution to issue cannot be found in the documentation.

Issue

We have an automated test that started failing this afternoon suddenly due to a missing import. Upon further investigation, we see that there are now changes that cause a failure when attempting to import mpi4py in Python:

>>> import mpi4py
>>> import mpi4py.MPI
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
>>>

It seems like there is another unconditional dependency required for this change (which we did not even realize had changed, BTW, because there was no indication via hash or version).

We believe this is a bug / unwanted behavior.

Installed packages

# packages in environment at /home/miniconda3/envs/mpi:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_x86_64-microarch-level   1                      2_x86_64    conda-forge
algopy                    0.7.2              pyhd8ed1ab_0    conda-forge
alsa-lib                  1.2.13               hb9d3cd8_0    conda-forge
ampl-mp                   3.1.0             h2cc385e_1006    conda-forge
archspec                  0.2.3              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
boltons                   24.0.0             pyhd8ed1ab_0    conda-forge
brotli                    1.1.0                hb9d3cd8_2    conda-forge
brotli-bin                1.1.0                hb9d3cd8_2    conda-forge
brotli-python             1.1.0           py310hf71b8c6_2    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.3               heb4867d_0    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
cairo                     1.18.0               hebfffa5_3    conda-forge
casadi                    3.6.7           py310h23aa882_0    conda-forge
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cffi                      1.17.1          py310h8deb56e_0    conda-forge
charset-normalizer        3.4.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
conda                     24.9.2          py310hff52083_0    conda-forge
conda-libmamba-solver     24.9.0             pyhd8ed1ab_0    conda-forge
conda-package-handling    2.4.0              pyh7900ff3_0    conda-forge
conda-package-streaming   0.11.0             pyhd8ed1ab_0    conda-forge
contourpy                 1.3.1           py310h3788b33_0    conda-forge
coverage                  7.6.4           py310h89163eb_0    conda-forge
cpython                   3.10.15         py310hd8ed1ab_2    conda-forge
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
debugpy                   1.8.8           py310hf71b8c6_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
dill                      0.3.9              pyhd8ed1ab_0    conda-forge
distro                    1.9.0              pyhd8ed1ab_0    conda-forge
double-conversion         3.3.0                h59595ed_0    conda-forge
eigen                     3.4.0                h00ab1b0_0    conda-forge
et_xmlfile                2.0.0              pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.2              pyhd8ed1ab_0    conda-forge
executing                 2.1.0              pyhd8ed1ab_0    conda-forge
expat                     2.6.4                h5888daf_0    conda-forge
flexcache                 0.3                pyhd8ed1ab_0    conda-forge
flexparser                0.4                pyhd8ed1ab_0    conda-forge
fmt                       11.0.2               h434a139_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_3    conda-forge
fontconfig                2.15.0               h7e30c49_1    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.54.1          py310h89163eb_1    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
frozendict                2.4.6           py310ha75aee5_0    conda-forge
gettext                   0.22.5               he02047a_3    conda-forge
gettext-tools             0.22.5               he02047a_3    conda-forge
glib                      2.82.2               h44428e9_0    conda-forge
glib-tools                2.82.2               h4833e2c_0    conda-forge
gmp                       6.3.0                hac33072_2    conda-forge
gmpy2                     2.1.5           py310he8512ff_2    conda-forge
graphite2                 1.3.13            h59595ed_1003    conda-forge
gst-plugins-base          1.24.7               h0a52356_0    conda-forge
gstreamer                 1.24.7               hf3bb09a_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
harfbuzz                  9.0.0                hda332d3_1    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       75.1                 he02047a_0    conda-forge
idna                      3.10               pyhd8ed1ab_0    conda-forge
importlib-metadata        8.5.0              pyha770c72_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
ipopt                     3.14.16             h122424a_10    conda-forge
ipykernel                 6.29.5             pyh3099207_0    conda-forge
ipython                   8.29.0             pyh707e725_0    conda-forge
jedi                      0.19.2             pyhff2d567_0    conda-forge
jsonpatch                 1.33               pyhd8ed1ab_0    conda-forge
jsonpointer               3.0.0           py310hff52083_1    conda-forge
jupyter_client            8.6.3              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.2              pyh31011fe_1    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.7           py310h3788b33_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libarchive                3.7.4                hfca40fe_0    conda-forge
libasprintf               0.22.5               he8f35ee_3    conda-forge
libasprintf-devel         0.22.5               he8f35ee_3    conda-forge
libblas                   3.9.0           25_linux64_openblas    conda-forge
libblasfeo                0.1.3              h544d10a_103    conda-forge
libbrotlicommon           1.1.0                hb9d3cd8_2    conda-forge
libbrotlidec              1.1.0                hb9d3cd8_2    conda-forge
libbrotlienc              1.1.0                hb9d3cd8_2    conda-forge
libcap                    2.69                 h0f662aa_0    conda-forge
libcblas                  3.9.0           25_linux64_openblas    conda-forge
libclang-cpp19.1          19.1.3          default_hb5137d0_0    conda-forge
libclang13                19.1.3          default_h9c6a7e4_0    conda-forge
libcups                   2.3.3                h4637d8d_4    conda-forge
libcurl                   8.10.1               hbbe4b11_0    conda-forge
libdeflate                1.22                 hb9d3cd8_0    conda-forge
libdrm                    2.4.123              hb9d3cd8_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libegl                    1.7.0                ha4b6fd6_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.6.4                h5888daf_0    conda-forge
libfatrop                 0.0.4                h5888daf_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.3                h59595ed_0    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgcrypt                 1.11.0               h4ab18f5_1    conda-forge
libgettextpo              0.22.5               he02047a_3    conda-forge
libgettextpo-devel        0.22.5               he02047a_3    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran-ng            14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libgl                     1.7.0                ha4b6fd6_2    conda-forge
libglib                   2.82.2               h2ff4ddf_0    conda-forge
libglvnd                  1.7.0                ha4b6fd6_2    conda-forge
libglx                    1.7.0                ha4b6fd6_2    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libgpg-error              1.50                 h4f305b6_0    conda-forge
libhwloc                  2.11.2          default_h0d58e46_1001    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           25_linux64_openblas    conda-forge
libllvm19                 19.1.3               ha7bfdaf_0    conda-forge
libmamba                  1.5.10               hf72d635_1    conda-forge
libmambapy                1.5.10          py310h6639945_1    conda-forge
libnghttp2                1.64.0               h161d5f1_0    conda-forge
libnl                     3.10.0               h4bc722e_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libogg                    1.3.5                h4ab18f5_0    conda-forge
libopenblas               0.3.28          pthreads_h94d23a6_1    conda-forge
libopengl                 1.7.0                ha4b6fd6_2    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libosqp                   0.6.3                h5888daf_1    conda-forge
libpciaccess              0.18                 hd590300_0    conda-forge
libpng                    1.6.44               hadc24fc_0    conda-forge
libpq                     16.4                 h2d7952a_3    conda-forge
libqdldl                  0.1.7                hcb278e6_0    conda-forge
libscotch                 7.0.4                h2fe6a88_5    conda-forge
libsndfile                1.2.2                hc60ed4a_1    conda-forge
libsodium                 1.0.20               h4ab18f5_0    conda-forge
libsolv                   0.7.30               h3509ff9_0    conda-forge
libspral                  2024.05.08           h2b245be_4    conda-forge
libsqlite                 3.47.0               hadc24fc_1    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libsystemd0               256.7                h2774228_1    conda-forge
libtiff                   4.7.0                he137b08_1    conda-forge
libudev1                  256.7                hb9d3cd8_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.17.0               h8a09558_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxkbcommon              1.7.0                h2c5496b_1    conda-forge
libxml2                   2.13.5               hb346dea_0    conda-forge
libxslt                   1.1.39               h76b75d6_0    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              hd590300_1001    conda-forge
matplotlib                3.9.2           py310hff52083_2    conda-forge
matplotlib-base           3.9.2           py310h68603db_2    conda-forge
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
menuinst                  2.2.0           py310hff52083_0    conda-forge
metis                     5.1.0             hd0bcaf9_1007    conda-forge
mpc                       1.3.1                h24ddda3_1    conda-forge
mpfr                      4.2.1                h90cbb55_3    conda-forge
mpg123                    1.32.9               hc50e24c_0    conda-forge
mpi                       1.0                     openmpi    conda-forge
mpi4py                    4.0.1           py310h58152c7_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
mumps-include             5.7.3                ha770c72_5    conda-forge
mumps-seq                 5.7.3                h27a6a8b_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              9.0.1                h266115a_2    conda-forge
mysql-libs                9.0.1                he0572af_2    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
networkx                  3.4.2              pyh267e887_2    conda-forge
nspr                      4.36                 h5888daf_0    conda-forge
nss                       3.106                hdf54f9c_0    conda-forge
numdifftools              0.9.41             pyhd8ed1ab_0    conda-forge
numpy                     2.1.3           py310hd6e36ab_0    conda-forge
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openmpi                   5.0.5              hd45feaf_104    conda-forge
openpyxl                  3.1.5           py310h0999ad4_1    conda-forge
openssl                   3.4.0                hb9d3cd8_0    conda-forge
packaging                 24.2               pyhd8ed1ab_0    conda-forge
pandas                    2.2.3           py310h5eaa309_1    conda-forge
parameterized             0.9.0              pyhd8ed1ab_0    conda-forge
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
patsy                     1.0.1              pyhff2d567_0    conda-forge
pcre2                     10.44                hba22ea6_2    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    11.0.0          py310hfeaa1f3_0    conda-forge
pint                      0.24.4             pyhd8ed1ab_0    conda-forge
pip                       24.3.1             pyh8b19718_0    conda-forge
pixman                    0.43.2               h59595ed_0    conda-forge
platformdirs              4.3.6              pyhd8ed1ab_0    conda-forge
plotly                    5.24.1             pyhd8ed1ab_0    conda-forge
pluggy                    1.5.0              pyhd8ed1ab_0    conda-forge
ply                       3.11               pyhd8ed1ab_2    conda-forge
prompt-toolkit            3.0.48             pyha770c72_0    conda-forge
proxsuite                 0.6.7           py310h3788b33_1    conda-forge
psutil                    6.1.0           py310ha75aee5_0    conda-forge
pthread-stubs             0.4               hb9d3cd8_1002    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pulseaudio-client         17.0                 hb77b528_0    conda-forge
pure_eval                 0.2.3              pyhd8ed1ab_0    conda-forge
pybind11                  2.13.6             pyh1ec8472_2    conda-forge
pybind11-abi              4                    hd8ed1ab_3    conda-forge
pybind11-global           2.13.6             pyh415d2e4_2    conda-forge
pycosat                   0.6.6           py310h2372a71_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pyomo                     6.8.1.dev0                dev_0    <develop>
pyparsing                 3.2.0              pyhd8ed1ab_1    conda-forge
pyqt                      5.15.9          py310h04931ad_5    conda-forge
pyqt5-sip                 12.12.2         py310hc6cd4ac_5    conda-forge
pyside6                   6.7.3           py310hfd10a26_1    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytest                    8.3.3              pyhd8ed1ab_0    conda-forge
pytest-parallel           0.1.1              pyhd8ed1ab_0    conda-forge
python                    3.10.15         h4a871b0_2_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-louvain            0.16               pyhd8ed1ab_0    conda-forge
python-tzdata             2024.2             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    5_cp310    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.2           py310ha75aee5_1    conda-forge
pyzmq                     26.2.0          py310h71f11fc_3    conda-forge
qhull                     2020.2               h434a139_5    conda-forge
qt-main                   5.15.15              h374914d_0    conda-forge
qt6-main                  6.7.3                h20baabe_0    conda-forge
qtconsole                 5.6.1              pyhd8ed1ab_0    conda-forge
qtconsole-base            5.6.1              pyha770c72_0    conda-forge
qtpy                      2.4.2              pyhdecd6ff_0    conda-forge
rdma-core                 54.0                 h5888daf_1    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reproc                    14.2.4.post0         hd590300_1    conda-forge
reproc-cpp                14.2.4.post0         h59595ed_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
ruamel.yaml               0.18.6          py310ha75aee5_1    conda-forge
ruamel.yaml.clib          0.2.8           py310ha75aee5_1    conda-forge
scipy                     1.14.1          py310hfcf56fc_1    conda-forge
seaborn                   0.13.2               hd8ed1ab_2    conda-forge
seaborn-base              0.13.2             pyhd8ed1ab_2    conda-forge
setuptools                75.3.0             pyhd8ed1ab_0    conda-forge
simde                     0.8.2                h84d6215_0    conda-forge
sip                       6.7.12          py310hc6cd4ac_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
statsmodels               0.14.4          py310hf462985_0    conda-forge
sympy                     1.13.3           pyh2585a3b_104    conda-forge
tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
tenacity                  9.0.0              pyhd8ed1ab_0    conda-forge
tinyxml2                  10.0.0               h59595ed_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.1.0              pyhff2d567_0    conda-forge
tornado                   6.4.1           py310ha75aee5_1    conda-forge
tqdm                      4.67.0             pyhd8ed1ab_0    conda-forge
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
truststore                0.10.0             pyhd8ed1ab_0    conda-forge
typing-extensions         4.12.2               hd8ed1ab_0    conda-forge
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
ucc                       1.3.0                h0f835a6_3    conda-forge
ucx                       1.17.0               h05e919c_3    conda-forge
unicodedata2              15.1.0          py310ha75aee5_1    conda-forge
unixodbc                  2.3.12               h661eb56_0    conda-forge
urllib3                   2.2.3              pyhd8ed1ab_0    conda-forge
wayland                   1.23.1               h3e06ad9_0    conda-forge
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
wheel                     0.45.0             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.1                hb711507_2    conda-forge
xcb-util-cursor           0.1.5                hb9d3cd8_0    conda-forge
xcb-util-image            0.4.0                hb711507_2    conda-forge
xcb-util-keysyms          0.4.1                hb711507_0    conda-forge
xcb-util-renderutil       0.3.10               hb711507_0    conda-forge
xcb-util-wm               0.4.2                hb711507_0    conda-forge
xkeyboard-config          2.43                 hb9d3cd8_0    conda-forge
xlrd                      2.0.1              pyhd8ed1ab_3    conda-forge
xorg-libice               1.1.1                hb9d3cd8_1    conda-forge
xorg-libsm                1.2.4                he73a12e_1    conda-forge
xorg-libx11               1.8.10               h4f16b4b_0    conda-forge
xorg-libxau               1.0.11               hb9d3cd8_1    conda-forge
xorg-libxdamage           1.1.6                hb9d3cd8_0    conda-forge
xorg-libxdmcp             1.1.5                hb9d3cd8_0    conda-forge
xorg-libxext              1.3.6                hb9d3cd8_0    conda-forge
xorg-libxfixes            6.0.1                hb9d3cd8_0    conda-forge
xorg-libxi                1.8.2                hb9d3cd8_0    conda-forge
xorg-libxrender           0.9.11               hb9d3cd8_1    conda-forge
xorg-libxtst              1.2.5                hb9d3cd8_3    conda-forge
xorg-libxxf86vm           1.1.5                hb9d3cd8_4    conda-forge
xorg-xf86vidmodeproto     2.3.1             hb9d3cd8_1005    conda-forge
xorg-xorgproto            2024.1               hb9d3cd8_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yaml-cpp                  0.8.0                h59595ed_0    conda-forge
zeromq                    4.3.5                h3b0a872_6    conda-forge
zipp                      3.21.0             pyhd8ed1ab_0    conda-forge
zlib                      1.3.1                hb9d3cd8_2    conda-forge
zstandard                 0.23.0          py310ha39cb0e_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

Environment info

% conda info

     active environment : mpi
    active env location : /home/miniconda3/envs/mpi
            shell level : 1
       user config file : /home/.condarc
populated config files : /home/.condarc
          conda version : 23.9.0
    conda-build version : not installed
         python version : 3.10.13.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=12.4=0
                          __glibc=2.34=0
                          __linux=5.14.0=0
                          __unix=0=0
       base environment : /home/miniconda3  (writable)
      conda av data dir : /home/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/nodefaults/linux-64
                          https://conda.anaconda.org/nodefaults/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/gurobi/linux-64
                          https://conda.anaconda.org/gurobi/noarch
                          https://conda.anaconda.org/ibmdecisionoptimization/linux-64
                          https://conda.anaconda.org/ibmdecisionoptimization/noarch
                          https://conda.anaconda.org/fico-xpress/linux-64
                          https://conda.anaconda.org/fico-xpress/noarch
          package cache : /home/miniconda3/pkgs
                          /home/.conda/pkgs
       envs directories : /home/miniconda3/envs
                          /home/.conda/envs
               platform : linux-64
             user-agent : conda/23.9.0 requests/2.31.0 CPython/3.10.13 Linux/5.14.0-427.35.1.el9_4.x86_64 rhel/9.4 glibc/2.34
                UID:GID : 53181:53181
             netrc file : None
           offline mode : False
mrmundt commented 1 day ago

Also, FWIW, we did try conda install cudatoolkit cuda-version=11 to see if we could get past the error and got this:

>>> from mpi4py import MPI
[hostname:1499766] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.hostname.53181/jf.0/2835742720/shared_mem_cuda_pool.hostname could be created.
[hostname:1499766] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728

We aren't sure if this is worth reporting but wanted to let you know that it happens.

dalcinl commented 18 hours ago

@minrk The last build is broken, a dependency on libcudart.so slipped in libmpi.so. The automated testing from conda-build did not catch the issue, I'm not sure why (maybe libcudart.soexists within/usr/lib64` in the Docker image).

I'm not sure how to proceed, either we try with LDFLAGS=-Wl,--as-needed (aren't these default?), or we manually patchelf the MPI library to remove the dependency.

$ python -s -c 'from mpi4py import MPI'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from mpi4py import MPI
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

$ readelf -d $CONDA_PREFIX/lib/libmpi.so.40 | grep cuda
 0x0000000000000001 (NEEDED)             Shared library: [libcudart.so.11.0]

$ patchelf --remove-needed libcudart.so.11.0 $CONDA_PREFIX/lib/libmpi.so.40
$ python -s -c 'from mpi4py import MPI'
[kw61149:3113005] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.kw61149.1000/jf.0/577503232/shared_mem_cuda_pool.kw61149 could be created.
[kw61149:3113005] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
minrk commented 15 hours ago

Marking the latest build as broken: https://github.com/conda-forge/admin-requests/pull/1164

Maybe there's a module that needs to be added manually to the DSO list that's not on the DSO-by-default list. The simplest solution, I guess, is to go back to --enable-mca-dso, since we know that worked, right?

minrk commented 15 hours ago

Weirdly, when I started to test, neither the arm nor ppc builds have this. It's only linux-64.

I think we should be able to compare the output of the link check in this build with the latest build to perhaps identify which DSO that links cudart is being bundled.

We can also add a test liefldd/readelf/etc. or something to make sure it's not linked, I suppose.

dalcinl commented 15 hours ago

Weirdly, when I started to test, neither the arm nor ppc builds have this. It's only linux-64.

Maybe some mishandled LDFLAGS?

We can also add a test liefldd/readelf/etc. or something to make sure it's not linked, I suppose.

Definitely.

Maybe there's a module that needs to be added manually to the DSO list that's not on the DSO-by-default list.

Unlikely, look at the test I posted above. After using patchelf to remove the dependency, things actually work. The dependency on libcudart.so seems to be redundant, indeed, although I did not try to run with CUDA to confirm things work afterwards. I still believe this is just overlinking. Maybe a -Wl,--as-needed flag that is not being passed down properly.

The simplest solution, I guess, is to go back to --enable-mca-dso, since we know that worked, right?

Makes sense, although maybe there is an easier and proper fix. In any case, enhancements can be done later. I'll go offline for a couple days. If you have the time, got for it.

minrk commented 13 hours ago

FWIW, the reason our tests are passing is that

/usr/local/cuda-11.8/targets/x86_64-linux/lib

is added in /etc/ld.so.conf, so it gets loaded by default. If there's an easy way to ignore ld.so.conf for a single process, I think our tests would fail as they should. I'm not sure how to do that, though. I can write a test that loads libmpi and checks for suspicious DLLs, though.