Closed charlesgwaldman closed 2 months ago
Thanks for the report Charles! 🙏
This is likely because this feedstock is now building with a CentOS 7 image on linux-64
, which uses GLIBC 2.17 (and has these newer symbols). So the packages don't work on systems using GLIBC older than 2.17
That said, the packages should carry this constraint to ensure that they are only installed on systems with a new enough GLIBC. However the GLIBC constraint wasn't being included before. PR ( https://github.com/conda-forge/openmpi-feedstock/pull/145 ) should fix that
An open question is whether older GLIBC are still of interest to maintain support for here. Will defer to the feedstock maintainers on that question
Actually sorry I misspoke, looking at the info provided above can see __glibc=2.38=0
, which means this is on an even newer system than we used to build. So this should be supported and is a bug
It's possible adding sysroot
(as done in the PR above) will fix this issue as well
Thanks Jack. Yes, this was observed on an up-to-date Gentoo Linux system with current glibc 2.38. I have not previously had any issues with conda-forge packages on this system. Adding sysroot to the build deps sounds like the right solution (probably should be used by default for all packages, IMO)
The sysroot
fix went in PR: https://github.com/conda-forge/openmpi-feedstock/pull/142
Packages are building and should be uploaded soon
Please look for a build/number
of 3
on the resulting packages
I'm afraid that the problem persists in both OpenMPI 5.0.1 and 5.0.2 as packaged by conda-forge. The last usable version is 5.0.0
The same failure is observed with both a very recent system (Gentoo linux with glibc 2.39) and and older system (CentOS7 with glibc 2.17)
Both complain about 'memcpy@GLIBC_2.14' and 'clock_gettime@GLIBC_2.17'. (These versioned symbols sure are a pain).
That last comment was from me, I was logged in under a different GitHub account without realizing.
One question I have is - what changed between OpenMPI 5.0.0 and 5.0.1? Can we just go back to the way 5.0.0 was getting built? That version does not suffer from the portability problems.
The most likely problem is that the the 5.0.1 image was built in a different, newer docker image. You should probably as the core conda-forge team. This is ultimately not an Open MPI issue or the fault of this feedstock (although I could be partially wrong).
Thanks dalcinl. How do I bring this to the attention of the core conda-forge team?
I usually contact the team via Gitter https://conda-forge.org/community/getting-in-touch/#gitter-and-element Ask them and let's see what they say. Maybe we can use some hack to make the binaries use the older symbol versions.
@jakirkham Do you think we are somehow messing things up in this feedstock?
The issue they are seeing is they are on newer systems (GLIBC 2.28+) and are having trouble resolving symbols that should be available on their systems as we built on (GLIBC 2.17+)
IOW the symbols should be available in their cases, but for some reason they are not
The bug may very well be in our build, but am a little fuzzy on how it is occurring
Could one of you seeing this error please trying installing sysroot_linux-64=2.17
(assuming you are on Linux x86_64
) and let us know if you still see issues?
Tried adding the GLIBC constraint to those packages directly ( https://github.com/conda-forge/openmpi-feedstock/pull/147 ). Maybe that helps?
@jakirkham
Installing sysroot_linux-64=2.17
resolved the issue on both CentOS7 and Gentoo. Thank you!
It also works with sysroot_linux=2.28
. The problem was caused by sysroot_linux-2.12
which got installed when I installed gfortran
. Installing openmpi
doesn't install a sysroot at all. So if you do mamba install gfortran openmpi
in a new environment, you will wind up with the 2.12 sysroot and an openmpi
which requires 2.17 or newer.
Thanks Charles! 🙏
Yeah that's what I was wondering about
Then I think we should try PR: https://github.com/conda-forge/openmpi-feedstock/pull/147
New packages are building. Will probably be a bit before they upload and mirror to CDN
Please test out tomorrow and let us know how it goes
It's still broken, or else I'm not seeing new packages. Testing this should be very easy. Just do mamba install openmpi gfortran
in a new environment and see which sysroot gets pulled in. If it's sysroot_linux-64 2.12
then we still have a problem because that's incompatible with mpifort
.
bash$ mamba create -n test; mamba activate test
(test)$ mamba install openmpi gfortran
Package Version Build Channel Size
──────────────────────────────────────────────────────────────────────────────────────
Install:
──────────────────────────────────────────────────────────────────────────────────────
+ mpi 1.0 openmpi conda-forge Cached
+ _libgcc_mutex 0.1 conda_forge conda-forge Cached
+ libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge Cached
+ ld_impl_linux-64 2.40 h41732ed_0 conda-forge Cached
+ ca-certificates 2024.2.2 hbcca054_0 conda-forge Cached
+ libgomp 13.2.0 h807b86a_5 conda-forge Cached
+ _openmp_mutex 4.5 2_gnu conda-forge Cached
+ libgcc-ng 13.2.0 h807b86a_5 conda-forge Cached
+ libiconv 1.17 hd590300_2 conda-forge Cached
+ libsanitizer 13.2.0 h7e041cc_5 conda-forge Cached
+ openssl 3.2.1 hd590300_1 conda-forge Cached
+ icu 73.2 h59595ed_0 conda-forge Cached
+ xz 5.2.6 h166bdaf_0 conda-forge Cached
+ libzlib 1.2.13 hd590300_5 conda-forge Cached
+ libgfortran5 13.2.0 ha4646dd_5 conda-forge Cached
+ libnl 3.9.0 hd590300_0 conda-forge Cached
+ libevent 2.1.12 hf998b51_1 conda-forge Cached
+ libxml2 2.12.6 h232c23b_1 conda-forge Cached
+ libgfortran-ng 13.2.0 h69a702a_5 conda-forge Cached
+ libhwloc 2.9.3 default_h554bfaf_1009 conda-forge Cached
+ openmpi 5.0.3 h7fc1de5_100 conda-forge 15MB
+ libgcc-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge Cached
+ kernel-headers_linux-64 2.6.32 he073ed8_17 conda-forge Cached
+ sysroot_linux-64 2.12 he073ed8_17 conda-forge Cached
+ binutils_impl_linux-64 2.40 hf600244_0 conda-forge Cached
+ gcc_impl_linux-64 13.2.0 h338b0a0_5 conda-forge Cached
+ gfortran_impl_linux-64 13.2.0 h76e1118_5 conda-forge Cached
+ gcc 13.2.0 hd6cf55c_3 conda-forge Cached
+ gfortran 13.2.0 h98b45c4_3 conda-forge Cached
....
(test)$ mpifort /tmp/test_mpi.f90
/home/cgw/miniforge3/envs/test/bin/../lib/gcc/x86_64-conda-linux-gnu/13.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/cgw/miniforge3/envs/test/lib/libmpi_mpifh.so: undefined reference to `memcpy@GLIBC_2.14'
/home/cgw/miniforge3/envs/test/bin/../lib/gcc/x86_64-conda-linux-gnu/13.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/cgw/miniforge3/envs/test/lib/./libpmix.so.2: undefined reference to `clock_gettime@GLIBC_2.17'
collect2: error: ld returned 1 exit status
(test)$ cat /tmp/test_mpi.f90
program hello
use mpi_f08
implicit none
integer(kind=MPI_INTEGER_KIND) ierror
call MPI_INIT(ierror)
call MPI_FINALIZE(ierror)
end program
Could you show the full output? I don't see the mpifort
package in the list, for example
That is the full package list, the package is called openmpi
(test)$ mamba info
mamba version : 1.5.8
active environment : test
active env location : /home/cgw/miniforge3/envs/test
shell level : 1
user config file : /home/cgw/.condarc
populated config files : /home/cgw/miniforge3/.condarc
/home/cgw/.condarc
conda version : 24.3.0
conda-build version : not installed
python version : 3.10.14.final.0
solver : libmamba (default)
virtual packages : __archspec=1=skylake
__conda=24.3.0=0
__glibc=2.39=0
__linux=6.8.2=0
__unix=0=0
base environment : /home/cgw/miniforge3 (writable)
conda av data dir : /home/cgw/miniforge3/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
package cache : /home/cgw/miniforge3/pkgs
/home/cgw/.conda/pkgs
envs directories : /home/cgw/miniforge3/envs
/home/cgw/.conda/envs
platform : linux-64
user-agent : conda/24.3.0 requests/2.31.0 CPython/3.10.14 Linux/6.8.2-gentoo gentoo/2.15 glibc/2.39 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8
UID:GID : 103:1000
netrc file : None
offline mode : False
(test)$ mamba list
# packages in environment at /home/cgw/miniforge3/envs/test:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
binutils_impl_linux-64 2.40 hf600244_0 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
gcc 13.2.0 hd6cf55c_3 conda-forge
gcc_impl_linux-64 13.2.0 h338b0a0_5 conda-forge
gfortran 13.2.0 h98b45c4_3 conda-forge
gfortran_impl_linux-64 13.2.0 h76e1118_5 conda-forge
icu 73.2 h59595ed_0 conda-forge
kernel-headers_linux-64 2.6.32 he073ed8_17 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libgcc-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge
libgcc-ng 13.2.0 h807b86a_5 conda-forge
libgfortran-ng 13.2.0 h69a702a_5 conda-forge
libgfortran5 13.2.0 ha4646dd_5 conda-forge
libgomp 13.2.0 h807b86a_5 conda-forge
libhwloc 2.9.3 default_h554bfaf_1009 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libnl 3.9.0 hd590300_0 conda-forge
libsanitizer 13.2.0 h7e041cc_5 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge
libxml2 2.12.6 h232c23b_1 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
mpi 1.0 openmpi conda-forge
openmpi 5.0.3 h7fc1de5_100 conda-forge
openssl 3.2.1 hd590300_1 conda-forge
sysroot_linux-64 2.12 he073ed8_17 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
Sorry I meant openmpi-mpifort
not mpiport
. You need to have it installed too, otherwise it seems you're using the system-provided MPI compiler wrapper, not the one coming from conda-forge (which the fix was applied to).
What?
bash$ which mpifort
which: no mpifort in (/home/cgw/Applications/.bin:/home/cgw/miniforge3/condabin:/home/cgw/bin:/home/cgw/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/lib/llvm/18/bin:/usr/lib/llvm/17/bin)
bash$ mamba activate test
(test)$ which mpifort
/home/cgw/miniforge3/envs/test/bin/mpifort
I have been using openmpi
from conda-forge
for several years and have never installed or heard of openmpi-mpifort
until now.
@leofang
As far as I can tell the difference between installing openmpi
and gfortran
, vs installing openmpi-mpifort
is that openmpi-mpifort
is tied to gfortran 11, while openmpi
+gfortran
gives you gfotran 13.
What is the purpose of the openmpi-mpifort
package? Is there some documentation that says why I have to have this installed?
What is the purpose of the
openmpi-mpifort
package? Is there some documentation that says why I have to have this installed?
The compiler wrapper packages openmpi-{mpicc,mpicxx,mpifort}
have been there forever (way before I became a maintainer IIRC). The purpose is to ensure a consistent compiler toolchain (same version as openmpi
used at build time) is installed, so as to avoid potential ABI issues.
It appears that when using CUDA 11 to build (which is what this feedstock does) we're pinned at gfortran 11: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/f2335dfd386a8ef51f75676bf24b74efe7aeab93/recipe/conda_build_config.yaml#L44 Even if we migrate to CUDA 12, we'd be using gfortran 12, not 13: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/f2335dfd386a8ef51f75676bf24b74efe7aeab93/recipe/migrations/cuda120.yaml#L91 Unless there is a global migrator that moves us to gfortran 13, there's not much we can do in this feedstock. I suggest opening an issue in https://github.com/conda-forge/conda-forge.github.io to discuss.
What would be the reason that you want to ignore the ABI compatibility issue and use gfortran 13?
Thanks for the reply.
It's not so much that I want to ignore ABI compatibility, but that I've been maintaining a Conda package that uses OpenMPI and gfortran for several years, and we have used the mpifort
that comes with the openmpi
package and have had no issues. It's not clear to users (e.g. myself) why openmpi-mpifort
is needed, when an mpifort
binary is provided by the openmpi
package itself. If this mpifort
is known to be broken or incompatible in some way, why is it being distributed with openmpi
?
Just an update that this is also an issue with clean installs of Linux Mint 21.3 (based on Ubuntu 24.04), using the most recent versions of openmpi and gcc/gfortran (version 13), as noted above for other versions. Generally it would be nice to be able to use this package with up-to-date compilers, so that packages and libraries we distribute which use openmpi can be used on systems with the latest compilers (especially seeing as how openmpi applications often find themselves on HPC clusters, where users often don't necessarily have a ton of control over compiler versions and sysadmins may be reluctant to have many installed versions of compilers). I would like to second the above question; is the current advice that this package should not be used, and instead only ever openmpi-mpifort? At least until this versioned symbols issue is fixed?
(This reply is a lot shorter if you ignore the code blocks and read the text in between, I'm just trying to create a clear reproducer. Or scroll to the bottom for the conclusion.)
I'm running into this same issue (albeit in C, but it's identical): installing openmpi
and gcc
simultaneously results in the above mentioned linking errors because gcc
pulls in sysroot_linux-64 2.12
, which then shadows the system glibc in the dynamic linker lookup, causing the error because openmpi
requires something newer.
$ conda create -n openmpi-test
$ conda activate openmpi-test
$ conda install openmpi
Channels:
- conda-forge
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/lourens/.miniconda3/envs/openmpi-test
added / updated specs:
- openmpi
The following NEW packages will be INSTALLED:
_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
ca-certificates conda-forge/linux-64::ca-certificates-2024.2.2-hbcca054_0
icu conda-forge/linux-64::icu-73.2-h59595ed_0
libevent conda-forge/linux-64::libevent-2.1.12-hf998b51_1
libgcc-ng conda-forge/linux-64::libgcc-ng-13.2.0-h77fa898_7
libgfortran-ng conda-forge/linux-64::libgfortran-ng-13.2.0-h69a702a_7
libgfortran5 conda-forge/linux-64::libgfortran5-13.2.0-hca663fb_7
libgomp conda-forge/linux-64::libgomp-13.2.0-h77fa898_7
libhwloc conda-forge/linux-64::libhwloc-2.10.0-default_h5622ce7_1001
libiconv conda-forge/linux-64::libiconv-1.17-hd590300_2
libnl conda-forge/linux-64::libnl-3.9.0-hd590300_0
libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-13.2.0-hc0a3c3a_7
libxml2 conda-forge/linux-64::libxml2-2.12.7-hc051c1a_0
libzlib conda-forge/linux-64::libzlib-1.3.1-h4ab18f5_1
mpi conda-forge/linux-64::mpi-1.0-openmpi
openmpi conda-forge/linux-64::openmpi-5.0.3-h47314c5_102
openssl conda-forge/linux-64::openssl-3.3.0-h4ab18f5_3
xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0
# <snip>
This installs an mpicc
, even though we don't have openmpi-mpicc
:
$ which mpicc
/home/lourens/.miniconda3/envs/openmpi-test/bin/mpicc
and it installs a libprrte.so
that uses symbols that require glibc>2.12:
$ nm -CD ~/.miniconda3/envs/openmpi-test/lib/libprrte.so.3.0.5 | grep memcpy
U memcpy@GLIBC_2.14
U __memcpy_chk@GLIBC_2.3.4
Since we don't have a Conda-installed glibc, the system one is used:
$ ldd ~/.miniconda3/envs/openmpi-test/lib/libprrte.so.3.0.5
# <snip>
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc11b6da000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc11bd26000)
# <snip>
and since that is new enough, the dependencies can be resolved and everything works (note that this uses mpicc
from Conda openmpi
, with system gcc
because there's no Conda GCC installed):
$ cat test_conda_openmpi.c
#include <mpi.h>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
MPI_Finalize();
}
$ mpicc -o test_conda_openmpi test_conda_openmpi.c
# <no error>
Installing gcc
pulls in sysroot_linux-64
2.12:
$ conda install gcc
Channels:
- conda-forge
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/lourens/.miniconda3/envs/openmpi-test
added / updated specs:
- gcc
The following NEW packages will be INSTALLED:
binutils_impl_lin~ conda-forge/linux-64::binutils_impl_linux-64-2.40-ha1999f0_1
gcc conda-forge/linux-64::gcc-13.2.0-hc7bed06_7
gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-13.2.0-h9eb54c0_7
kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-2.6.32-he073ed8_17
ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.40-hf3520f5_1
libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-13.2.0-hceb6213_107
libsanitizer conda-forge/linux-64::libsanitizer-13.2.0-h6ddb7a1_7
sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.12-he073ed8_17
and this breaks things:
$ mpicc -o test_conda_openmpi test_conda_openmpi.c
/home/lourens/.miniconda3/envs/openmpi-test/bin/../lib/gcc/x86_64-conda-linux-gnu/13.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/lourens/.miniconda3/envs/openmpi-test/lib/libmpi.so: undefined reference to `memcpy@GLIBC_2.14'
/home/lourens/.miniconda3/envs/openmpi-test/bin/../lib/gcc/x86_64-conda-linux-gnu/13.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/lourens/.miniconda3/envs/openmpi-test/lib/./libpmix.so.2: undefined reference to `clock_gettime@GLIBC_2.17'
collect2: error: ld returned 1 exit status
I can confirm that installing openmpi-mpicc
fixes it, but seeming only because it causes a downgrade to a previous version of openmpi
which was probably built in an older container image that doesn't have the newer glibc:
$ conda install openmpi-mpicc
Channels:
- conda-forge
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/lourens/.miniconda3/envs/openmpi-test
added / updated specs:
- openmpi-mpicc
The following packages will be downloaded:
package | build
---------------------------|-----------------
gcc-12.3.0 | h915e2ae_7 25 KB conda-forge
gcc_impl_linux-64-12.3.0 | h58ffeeb_7 48.8 MB conda-forge
libgcc-devel_linux-64-12.3.0| h0223996_107 2.4 MB conda-forge
libsanitizer-12.3.0 | hb8811af_7 3.8 MB conda-forge
openmpi-5.0.1 | hb0ee255_100 14.3 MB conda-forge
openmpi-mpicc-5.0.1 | hd590300_100 12 KB conda-forge
------------------------------------------------------------
Total: 69.3 MB
The following NEW packages will be INSTALLED:
binutils_linux-64 conda-forge/linux-64::binutils_linux-64-2.40-hdade7a5_3
gcc_linux-64 conda-forge/linux-64::gcc_linux-64-12.3.0-h6477408_3
openmpi-mpicc conda-forge/linux-64::openmpi-mpicc-5.0.1-hd590300_100
The following packages will be DOWNGRADED:
gcc 13.2.0-hc7bed06_7 --> 12.3.0-h915e2ae_7
gcc_impl_linux-64 13.2.0-h9eb54c0_7 --> 12.3.0-h58ffeeb_7
libgcc-devel_linu~ 13.2.0-hceb6213_107 --> 12.3.0-h0223996_107
libhwloc 2.10.0-default_h5622ce7_1001 --> 2.9.3-default_h554bfaf_1009
libsanitizer 13.2.0-h6ddb7a1_7 --> 12.3.0-hb8811af_7
libzlib 1.3.1-h4ab18f5_1 --> 1.2.13-h4ab18f5_6
openmpi 5.0.3-h47314c5_102 --> 5.0.1-hb0ee255_100
and now we have older symbols that work with glibc==2.12
:
$ nm ~/.miniconda3/pkgs/openmpi-5.0.1-hb0ee255_100/lib/libprrte.so.3.0.3 | grep memcpy
U __memcpy_chk@GLIBC_2.3.4
U memcpy@GLIBC_2.2.5
If I specifically ask for the latest openmpi-mpicc
then it's broken again:
$ conda install 'openmpi-mpicc==5.0.3'
Channels:
- conda-forge
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/lourens/.miniconda3/envs/openmpi-test
added / updated specs:
- openmpi-mpicc==5.0.3
The following packages will be UPDATED:
libhwloc 2.9.3-default_h554bfaf_1009 --> 2.10.0-default_h5622ce7_1001
openmpi 5.0.1-hb0ee255_100 --> 5.0.3-h47314c5_102
openmpi-mpicc 5.0.1-hd590300_100 --> 5.0.3-hc43e4ee_102
The following packages will be DOWNGRADED:
gcc 12.3.0-h915e2ae_7 --> 11.4.0-h602e360_7
gcc_impl_linux-64 12.3.0-h58ffeeb_7 --> 11.4.0-h00c12a0_7
gcc_linux-64 12.3.0-h6477408_3 --> 11.4.0-h0f0c6b6_3
libgcc-devel_linu~ 12.3.0-h0223996_107 --> 11.4.0-h515aa5d_107
libsanitizer 12.3.0-hb8811af_7 --> 11.4.0-h5763a12_7
Indeed if we try to compile:
$ mpicc -o test_conda_openmpi test_conda_openmpi.c
/home/lourens/.miniconda3/envs/openmpi-test/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/lourens/.miniconda3/envs/openmpi-test/lib/libmpi.so: undefined reference to `memcpy@GLIBC_2.14'
/home/lourens/.miniconda3/envs/openmpi-test/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/lourens/.miniconda3/envs/openmpi-test/lib/./libpmix.so.2: undefined reference to `clock_gettime@GLIBC_2.17'
collect2: error: ld returned 1 exit status
Despite #142 and #145, it seems that no released package ever had a dependency on sysroot
:
$ conda search --info openmpi-mpicc | grep sysroot
$ conda search --info openmpi | grep sysroot
$ conda search --info openmpi-mpifort | grep sysroot
It seems to me that depending on sysroot
isn't the right solution anyway, __glibc
will do fine if it's new enough. Instead, it seems to me that openmpi
, and in fact all packages built with the newer docker image that contains a newer glibc, should have an automatic run_constrained
requirement on sysroot
that ensures that if sysroot
is installed, it's new enough.
That suggests that this issue should be resolved elsewhere, but I don't know enough about how conda-forge works to know where to take it. Can someone give a hint?
Maybe this is actually some sort of build issue? If the package had been built with the proper sysroot installed in the build environment, then the binaries would not end up using newer symbols. Are we somehow missing something in our recipe to constrain the build-time sysroot? Or is really adding a run_constrained
for sysroot the proper way to go?
Interesting question. Does conda-forge have a policy on which minimum glibc it supports? Then packages should adhere to that, and then the Docker image should provide that exact glibc to link against, either natively or via a sysroot package. Relying on each and every package maintainer to read that policy and adjust their package definition accordingly doesn't sound like a working strategy...
Glibc 2.12 was released in 2010, and 2.17 in 2012, so even enterprise Linux distributions should have at least 2.17 by now and requiring it doesn't seem all that controversial to me. So perhaps another question is why conda install gcc
drags in a 14 year old glibc even though a 12 year old one is available?
Ah, maybe the answer to that second question is that this causes packages to be built against glibc 2.12, thus making them compatible with everything and avoiding the problem we're seeing?
But the openmpi
package has compiler('c')
as the compiler, which is gcc
on Linux, which should then pull in the sysroot 2.12. And now I understand your comment :smile:. Yes, this is strange then.
And looking at the build configuration, this package explicitly builds against glibc 2.17, first by explicitly having sysroot 2.17 as a build dependency, and after this commit in conda_build_config.yaml
.
Searching for c_stdlib_version
led me to this announcement which finally sheds some light on things. It links to https://github.com/conda-forge/conda-forge.github.io/issues/2102. I've been so bold as to add a comment there.
@conda-forge/openmpi, we're seeing this issue in https://github.com/conda-forge/esmf-feedstock/pull/116. I tried adding openmpi-mpifort
to the build section of the recipe and it only made things worse For cross-compiling, we're seeing errors like:
gcc_linux-64 11.4.0 h0f0c6b6_2 needed by gfortran_linux-64-11.4.0-h8f970dc_2
suggesting that it's maybe getting confused about what architecture to install for. And it made no different for the linux-64 build, which still has:
mpif90 -m64 -mcmodel=small -pthread -Wl,--no-as-needed -fopenmp -L$SRC_DIR/lib/libO/Linux.gfortran.64.openmpi.default -L$PREFIX/lib -L$PREFIX/lib -L$BUILD_PREFIX/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/ -Wl,-rpath,$SRC_DIR/lib/libO/Linux.gfortran.64.openmpi.default -Wl,-rpath,$PREFIX/lib -Wl,-rpath,$PREFIX/lib -Wl,-rpath,$BUILD_PREFIX/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/ -o $SRC_DIR/test/testO/Linux.gfortran.64.openmpi.default/ESMF_StringUTest ESMCI_StringSubr.o ESMF_StringUTest.o -lesmf -lrt -lstdc++ -ldl -lnetcdff -lnetcdf -lpioc
/home/conda/feedstock_root/build_artifacts/esmf_1717757906123/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/conda/feedstock_root/build_artifacts/esmf_1717757906123/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/lib/./libpmix.so.2: undefined reference to `clock_gettime@GLIBC_2.17'
/home/conda/feedstock_root/build_artifacts/esmf_1717757906123/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/conda/feedstock_root/build_artifacts/esmf_1717757906123/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/lib/libmpi_mpifh.so: undefined reference to `memcpy@GLIBC_2.14'
collect2: error: ld returned 1 exit status
Here is the constents of the build environment for linux-64, which I think is the most relevant:
environment location: /home/conda/feedstock_root/build_artifacts/esmf_1717757906123/_build_env
The following NEW packages will be INSTALLED:
_libgcc_mutex: 0.1-conda_forge conda-forge
_openmp_mutex: 4.5-2_gnu conda-forge
binutils_impl_linux-64: 2.40-ha1999f0_2 conda-forge
binutils_linux-64: 2.40-hdade7a5_3 conda-forge
bzip2: 1.0.8-hd590300_5 conda-forge
c-ares: 1.28.1-hd590300_0 conda-forge
ca-certificates: 2024.6.2-hbcca054_0 conda-forge
cmake: 3.29.5-hcafd917_0 conda-forge
gcc_impl_linux-64: 12.3.0-h58ffeeb_7 conda-forge
gcc_linux-64: 12.3.0-h6477408_3 conda-forge
gfortran_impl_linux-64: 12.3.0-h1645026_7 conda-forge
gfortran_linux-64: 12.3.0-h617cb40_3 conda-forge
gnuconfig: 2020.11.07-hd8ed1ab_0 conda-forge
gxx_impl_linux-64: 12.3.0-h2a574ab_7 conda-forge
gxx_linux-64: 12.3.0-h4a1b8e8_3 conda-forge
icu: 73.2-h59595ed_0 conda-forge
kernel-headers_linux-64: 2.6.32-he073ed8_17 conda-forge
keyutils: 1.6.1-h166bdaf_0 conda-forge
krb5: 1.21.2-h659d440_0 conda-forge
ld_impl_linux-64: 2.40-hf3520f5_2 conda-forge
libcurl: 8.8.0-hca28451_0 conda-forge
libedit: 3.1.20191231-he28a2e2_2 conda-forge
libev: 4.33-hd590300_2 conda-forge
libevent: 2.1.12-hf998b51_1 conda-forge
libexpat: 2.6.2-h59595ed_0 conda-forge
libgcc-devel_linux-64: 12.3.0-h0223996_107 conda-forge
libgcc-ng: 13.2.0-h77fa898_7 conda-forge
libgfortran-ng: 13.2.0-h69a702a_7 conda-forge
libgfortran5: 13.2.0-hca663fb_7 conda-forge
libgomp: 13.2.0-h77fa898_7 conda-forge
libhwloc: 2.9.3-default_h554bfaf_1009 conda-forge
libiconv: 1.17-hd590300_2 conda-forge
libnghttp2: 1.58.0-h47da74e_1 conda-forge
libnl: 3.9.0-hd590300_0 conda-forge
libsanitizer: 12.3.0-hb8811af_7 conda-forge
libssh2: 1.11.0-h0841786_0 conda-forge
libstdcxx-devel_linux-64: 12.3.0-h0223996_107 conda-forge
libstdcxx-ng: 13.2.0-hc0a3c3a_7 conda-forge
libuv: 1.48.0-hd590300_0 conda-forge
libxcrypt: 4.4.36-hd590300_1 conda-forge
libxml2: 2.12.7-hc051c1a_1 conda-forge
libzlib: 1.3.1-h4ab18f5_1 conda-forge
make: 4.3-hd18ef5c_1 conda-forge
mpi: 1.0-openmpi conda-forge
ncurses: 6.5-h59595ed_0 conda-forge
openmpi: 5.0.1-hb0ee255_100 conda-forge
openmpi-mpifort: 5.0.1-heb67821_100 conda-forge
openssl: 3.3.1-h4ab18f5_0 conda-forge
perl: 5.32.1-7_hd590300_perl5 conda-forge
pkg-config: 0.29.2-h36c2ea0_1008 conda-forge
rhash: 1.4.4-hd590300_0 conda-forge
sysroot_linux-64: 2.12-he073ed8_17 conda-forge
xz: 5.2.6-h166bdaf_0 conda-forge
zstd: 1.5.6-ha6fb4c9_0 conda-forge
Notably sysroot
2.12.
I'm at a bit of a loss and can't say I was able to follow the discussion above particularly well. Is there a suggested fix for this?
I have upgraded my meta.yaml
to the new way of specifying use of the C library using
requirements:
build:
- {{ stdlib('c') }}
- <other stuff>
and then in my conda_build_config.yaml
I have:
c_stdlib:
- sysroot # [linux]
- macos_deployment_target # [osx]
c_stdlib_version:
- 2.17 # [linux]
- 11.3 # [osx]
I have yet to get a Mac to try the osx build on, so I'm not sure that that works yet, but it fixes the Linux build at least.
What this does is to specify that we want at least glibc 2.17 for the package we're building, which ensures that glibc 2.17 is available so that we can link to it properly. Of course this is strictly speaking incorrect if the present package works with 2.12, but since 2.17 will soon be the standard for everything anyway, it doesn't matter much.
@LourensVeen, thank you! I'll give that a shot.
I'm not sure why building with sysroot 2.17 creates a package that claims to be compatible with sysroot 2.12, or if it's something particular about openmpi that isn't universal. This comes up in this package's own tests, where sysroot 2.12 is installed in the test environment. I don't know why that is.
FWIW, compiling with $LDFLAGS avoids this problem, at least in openmpi's own tests (#159)
mpif77 test.f # fails to find versioned glib symbols
mpif77 $LDFLAGS test.f # succeeds
I'm not actually sure which LDFLAG is responsible (maybe -Wl,--as-needed
?)
There does seem to be a problem in general where this package being built with a particular sysroot has a runtime conflict with other packages being built with lower sysroot, but the package dependencies don't actually capture this conflict. I don't know enough about all this linking to say what's the right fix and whether it belongs in sysroot itself or openmpi.
Does it make sense to put sysroot 2.17 in run_constrained for this package? And if so, do we need repodata patches? And does it make sense to do that in general in sysroot itself, or just here?
From this comment, it appears the missing flag is:
-Wl,--allow-shlib-undefined
maybe we should try to get this into the compiler wrapper's default flags
-Wl,--allow-shlib-undefined
.I think setting:
export OMPI_LDFLAGS="-L$PREFIX/lib -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib"
will fix anything using the compiler wrapper and building with sysroot 2.12, and #159 will make sure that's the default, I think actually resolving this issue.
In general, anything using $LDFLAGS (everything should) should not encounter this problem, but notably I believe that CMake's FindMPI
does not do this. I'm not sure what arguments need to be specified, possibly `-DMPI_FORTRAN_LINK_FLAGS="-L$PREFIX/lib -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib" but I'm not sure.
I agree that that would probably (I didn't check it) avoid the build failure, but it wouldn't actually solve the problem, just disable the error at that particular point and kick the can down the road.
The built package would still have a dependency on openmpi
, which would still depend on glibc 2.17, and things would still break if the user installs gcc
as well as the built package, because gcc
would drag in sysroot==2.12
, which would then be used by the dynamic linker loader when trying to start an MPI program, and it would fail to find the newer symbols there.
I don't want to speak with too much confidence because I'm a bit out of my depth, but I don't think that's the case. For example, in https://github.com/conda-forge/openmpi-feedstock/pull/159, the test environments for opempi-mpicc
(and fort, etc.) pull in gcc
and sysroot==2.12
and I think exercise precisely the case you describe. They compile and link correctly (thanks to the addition of -Wl,--allow-shlib-undefined
), and then execute correctly because the glibc
used at runtime is not the backdated one that's linked from sysroot. Runtime glibc comes from the system (which would have prevented installation if it really was too new, due to the __glibc
requirement). I'm not sure my description of the situation is completely right, but I do know that an env with sysroot=2.12 does not prevent execution when linking symbols from 2.17.
Seems like a big hammer to me. I feel we need alignment with @conda-forge/core regarding the discussion https://github.com/conda-forge/linux-sysroot-feedstock/issues/63 started by @h-vetinari (EDIT: I see @minrk had also raised the discussion here https://github.com/conda-forge/conda-forge.github.io/issues/2102#issuecomment-2156641517). I am not comfortable adding this flag unconditionally. It only solves our problem but I do not believe Open MPI is the only project affected by glibc symbol issues.
I do not believe Open MPI is the only project affected by glibc symbol issues.
It's the only package that compiles other things and does not respect LDFLAGS.
@minrk: Good point (and a working example is hard to argue with :smile:). I failed to consider that the dynamic linker would come from Conda and find the sysroot-installed glibc, but that the loader comes from the system and wouldn't be bothered by it.
Just disabling the check still feels wrong to me though, and I'm worried that it could cause other problems. What if you tried to compile some MPI code that itself needs a newer glibc than you have available on your system? I guess it would fail when you try to run the program, but it might be hard to debug, and you would expect it to fail at link time.
This could affect Conda too if an MPI-using package got an update that introduced a dependency on a newer glibc than Conda uses (say 2.25). You probably wouldn't notice outside of Conda (because you have a much newer glibc everywhere), and the package build would succeed (because the check is disabled), but a user trying to run the package on an older system with glibc>=2.17<2.25 would still get an error. I guess we'd have to hope there's a test that actually runs things and fails the build that way.
It's the only package that compiles other things and does not respect LDFLAGS.
FWIW, it's cmake's FindMPI, not openmpi, that is ignoring compiler flags. I think the mistake I made in #158 is to assume that the mpi compiler wrappers should be responsible for respecting $CFLAGS/$LDFLAGS, but that's not right - they should be minimal extensions of $CC/$FC etc. with just enough to find/link mpi.h/libmpi. $CFLAGS/$LDFLAGS should be passed to the compiler wrappers, just like regular compilers, which happens as expected after FindMPI succeeds, but not during for some reason. Sounds plausibly like a CMake bug to me.
Within the context of conda-forge where the linked glibc may be older than that of your dependencies, -Wl,--allow-shlib-undefined
is part of what's required to link conda-forge openmpi (or ~any other dependency using 2.17).
a user trying to run the package on an older system with glibc>=2.17<2.25 would still get an error
@LourensVeen I think the dependency on __glibc
version already addresses that. The package wouldn't be installable for them because it would have __glibc >=2.25
in its requirements (from sysroot's run_exports here).
Instead, they would get the latest build that supported their version of glibc.
Also, note that disabling the check in the openmpi compiler wrapper is just applying a subset of what's already applied to all conda-forge-built shared libraries via $LDFLAGS. It's just that for some reason FindMPI checks that mpicc and friends work without $LDFLAGS, and this is the only flag where that really matters.
I was thinking of a scenario where the packager wouldn't notice the new dependency, and fail to update the sysroot dependency. But that's arguably a bug in that package then, and anyway it probably wouldn't build on conda-forge to begin with, because the Docker container doesn't have the new glibc available.
Okay, I'm out of arguments. I still don't like it conceptually (I'd prefer the solution in https://github.com/conda-forge/linux-sysroot-feedstock/issues/63) but I can't see it breaking anything. And I have to test, but this probably fixes my problem, so thanks!
But I don't think that can happen either, because openmpi itself still carries the newer glibc dependency, so you won't be able to install it due to an unsolvable dependency. I don't think the downstream package gets a glibc dependency that's not represented in the requirements of the package or its dependencies, though I could have misunderstood something.
@leofang I'm trying to understand your objection to including the flag by default. This flag is in default $LDFLAGS, so it is already used on all shared libraries compiled on conda-forge. This is conda-forge-wide, not specific to openmpi. This change is only making the compiler wrapper more consistent with every link command called on conda-forge.
The only thing that appears to be openmpi-specific here is that CMake's FindMPI ignores LDFLAGS, so without this flag it may not find working mpi. (edit: that was inaccurate)
Including it in the wrapper is also only setting a default, not unconditional, and overridable with OMPI_LDFLAGS
(or a later cli flag or $LDFLAGS, which sets the same flag again).
Thanks all for the helpful discussion here and weighing potential solutions! 🙏
Wanted to follow up on one point...
FWIW, it's cmake's FindMPI, not openmpi, that is ignoring compiler flags. I think the mistake I made in #158 is to assume that the mpi compiler wrappers should be responsible for respecting $CFLAGS/$LDFLAGS, but that's not right - they should be minimal extensions of $CC/$FC etc. with just enough to find/link mpi.h/libmpi. $CFLAGS/$LDFLAGS should be passed to the compiler wrappers, just like regular compilers, which happens as expected after FindMPI succeeds, but not during for some reason. Sounds plausibly like a CMake bug to me.
If we are able to construct a simple example of this behavior and include it in a new CMake issue, we can work to address it
I think I might be conflating different issues (#158 fixed two issues, one of which fixed FindMPI and it was FCFLAGS-related, not link-related) and getting things wrong. When I run tests, FindMPI definitely does use LDFLAGS.
I've re-read, and the original post by @charlesgwaldman I think is compiling in a user environment with FC=mpifort
, (not FindMPI), and I think the issue is that the package gfortran
doesn't include compiler activation on linux (it does on mac, I'm not sure why they are inconsistent). If you conda install fortran-compiler
(or gfortran_linux-64
), LDFLAGS will be defined and everything should work. I've confirmed this in docker with the CMakeLists.txt:
I think all of the cases where this error is coming up are attributable to $LDFLAGS not being passed, either in package build systems or user environments, and not cmake itself, e.g.:
gfortran
and not gfortran_linux-64
, so LDFLAGS is not setSince runtime compilation in user environments is a common and reasonable thing to do, I think it's a valid question to ask: should the mpi compilers work in user environments without the conda-build compiler activation scripts? If so, the answer is either:
-Wl,--allow-shlib-undefined
to the default flags in the compiler wrapper (#159)(or both)
Note that 2. fixes this issue in all situations, because I think we have learned that building downstream packages with an older sysroot is actually fine, whereas 1. fixes it for user environments, test environments, and cross compiled packages (because openmpi
is in the build env, unless we do #161), but the pinning will not affect native builds, where openmpi is only in the host env. Respecting $LDFLAGS, which is a reasonable expectation of conda-forge recipes, solves those cases, though.
run_constrained on sysroot will end up preventing downstream packages from using older sysroot in the build, which I think we've learned actually works fine. In practice, though, once openmpi is on newer glibc, all downstream dependencies might as well update since runtime will require newer glibc due to the dependency so there's no benefit to holding back.
Solution to issue cannot be found in the documentation.
Issue
I am using cmake, gfortran, and openmpi from conda-forge to compile a Fortran package. With cmake 3.28.3, gfortran 13.2.0 and openmpi 5.0.0 everything works. When openmpi upgraded to the latest 5.0.1 version I started getting this error:
Installed packages
Environment info