ecmwf-ifs / fiat

The Fortran IFS and Arpege Toolkit
Apache License 2.0
9 stars 24 forks source link

Linking fails in conda environment #5

Closed rolfhm closed 1 year ago

rolfhm commented 2 years ago

Hi I tried to compile fiat in a conda environment environment using gcc-11 and gcc-12. When compiling, there is a ton of warnings about mismatch in arguments in the calls to the MPI routines. When linking, it fails and claims it can't find libgomp.so.1, libstdc++.so.6, libgfortran.so.5, libquadmath.so.0. This happens despite CMake finding the correct libgomp and libmpi in the conda environment files. using -DENABLE_MPI=OFF make no difference.

wdeconinck commented 2 years ago

Hi @rolfhm, since gfortran 10, it is unavoidable to have these warnings when using the MPI F77 API. The only means to remove these warnings is to port MPL to use the MPI_F08 module. This is something we're considering (poke @marsdeno), when we get around to it.

As for libgomp, that is not OpenMPI but compiler-specific OpenMP library. I suspect nothing is wrong with MPI. What is the cmake command used to configure?

rolfhm commented 2 years ago

Yes, it's not only the omp library, but also libstdc++ and libgfortran, so it is a bit of disaster. I have used the compilers to compile and run other programs without problems.

I used cmake .. in fiat/build , after exporting FC, CC, and CXX. It seems to find ecbuild and MPI on its own.

reuterbal commented 2 years ago

Given that -fallow-arguments-mismatch should automatically be set for GNU >= 10.0, I think these warnings shouldn't show up? https://github.com/ecmwf-ifs/fiat/blob/9507e783745d59d9a3c01e72482ab52205d2dd7f/src/fiat/CMakeLists.txt#L34-L39 Makes me wonder whether something goes wrong in the compiler identification?

marsdeno commented 2 years ago

@reuterbal That -fallow-arg-mismatch option degrades compiler strictness from failing on cases of mismatch, to emitting warnings about them. So sounds like that part is working ok from OP's message. In any case I suspect unrelated to linking issue.

wdeconinck commented 2 years ago

Hi @rolfhm I tried on our HPC system with Conda and was successful there. My commands to reproduce on our HPC

module load conda
conda create -n fiatenv
conda activate fiatenv
conda install gcc
conda install gfortran
conda install openmpi
conda install cmake
git clone ssh://git@github.com/ecmwf/ecbuild
git clone ssh://git@github.com/ecmwf-ifs/fiat
cd fiat
mkdir build
cd build
cmake ..
make

Note that if ecbuild sources are parallel to fiat sources, it will find it.

rolfhm commented 2 years ago

Looks very similar to what I did, except I had to specify conda-forge to get gcc and gfortran conda install -c conda-forge, otherwise it did not find the package.

Did you check which gcc it used in the installation? If I did not set CC and FC as environment variables, it used the built-in gcc/gfortran 7.5 on my laptop and compiled.

wdeconinck commented 2 years ago

conda-forge was added as a permanent channel beforehand, so should be the same. And I verified and linking happens to the Conda environment, and also verified that cmake picked up the Conda gcc, g++, gfortran

ldd libfiat.so
        linux-vdso.so.1 (0x00007fffe87df000)
    libdl.so.2 => /usr/lib64/libdl.so.2 (0x000014802f58d000)
    librt.so.1 => /usr/lib64/librt.so.1 (0x000014802f385000)
    libmpi_mpifh.so.40 => /perm/nawd/conda/envs/fiatenv/lib/libmpi_mpifh.so.40 (0x000014802fbba000)
    libgomp.so.1 => /perm/nawd/conda/envs/fiatenv/lib/libgomp.so.1 (0x000014802fb84000)
    libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x000014802f165000)
    libstdc++.so.6 => /perm/nawd/conda/envs/fiatenv/lib/libstdc++.so.6 (0x000014802efba000)
    libgfortran.so.5 => /perm/nawd/conda/envs/fiatenv/lib/libgfortran.so.5 (0x000014802ee11000)
    libm.so.6 => /usr/lib64/libm.so.6 (0x000014802ea8f000)
    libc.so.6 => /usr/lib64/libc.so.6 (0x000014802e6ca000)
    /lib64/ld-linux-x86-64.so.2 (0x000014802fa0a000)
    libmpi.so.40 => /perm/nawd/conda/envs/fiatenv/lib/./libmpi.so.40 (0x000014802fa5e000)
    libopen-pal.so.40 => /perm/nawd/conda/envs/fiatenv/lib/./libopen-pal.so.40 (0x000014802e5c3000)
    libgcc_s.so.1 => /perm/nawd/conda/envs/fiatenv/lib/libgcc_s.so.1 (0x000014802fa47000)
    libquadmath.so.0 => /perm/nawd/conda/envs/fiatenv/lib/libquadmath.so.0 (0x000014802e589000)
    libopen-rte.so.40 => /perm/nawd/conda/envs/fiatenv/lib/././libopen-rte.so.40 (0x000014802e4cd000)
    libopen-orted-mpir.so => /perm/nawd/conda/envs/fiatenv/lib/././libopen-orted-mpir.so (0x000014802fa42000)
    libutil.so.1 => /usr/lib64/libutil.so.1 (0x000014802e2c9000)
    libz.so.1 => /perm/nawd/conda/envs/fiatenv/lib/./././libz.so.1 (0x000014802e2af000)
rolfhm commented 2 years ago

Do you export CC, FC, and CXX? I have to export them, else it will find the built in gcc 7.

rolfhm commented 2 years ago

And I also need to install gxx

rolfhm commented 2 years ago

It looks like it uses the linker down in /usr/bin. From ecbuild.log:

2022-10-06T14:04:21 - fiat - INFO - linker : /usr/bin/ld
2022-10-06T14:04:21 - fiat - INFO - ar     : /usr/bin/ar
2022-10-06T14:04:21 - fiat - INFO - ranlib : /usr/bin/ranlib
wdeconinck commented 2 years ago

I verified and somehow automatically the environment variables CC CXX FC were pointing to the right compilers. The build summary reads:

2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - Project fiat summary
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - Build type                    : [RelWithDebInfo]
2022-10-06T09:49:08 - fiat - INFO - Fortran flags                 : [-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include -O2 -g -DNDEBUG]
2022-10-06T09:49:08 - fiat - INFO - C flags                       : [-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include -pipe -Wall -Wextra -Wno-unused-parameter -Wno-unused-variable -Wno-gnu-zero-variadic-macro-arguments -Wno-deprecated-declarations -O2 -g -DNDEBUG]
2022-10-06T09:49:08 - fiat - INFO - C++ flags                     : [-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include -pipe -O2 -g -DNDEBUG]
2022-10-06T09:49:08 - fiat - INFO - OpenMP (following variable can be overwritten by user)
2022-10-06T09:49:08 - fiat - INFO -     OpenMP_Fortran_FLAGS      : [-fopenmp]
2022-10-06T09:49:08 - fiat - INFO - MPI (export MPI_HOME to correct MPI implementation)
2022-10-06T09:49:08 - fiat - INFO -     MPI_Fortran_INCLUDE_DIRS  : [/perm/nawd/conda/envs/fiatenv/include /perm/nawd/conda/envs/fiatenv/lib]
2022-10-06T09:49:08 - fiat - INFO -     MPI_Fortran_LIBRARIES     : [/perm/nawd/conda/envs/fiatenv/lib/libmpi_usempif08.so /perm/nawd/conda/envs/fiatenv/lib/libmpi_usempi_ignore_tkr.so /perm/nawd/conda/envs/fiatenv/lib/libmpi_mpifh.so /perm/nawd/conda/envs/fiatenv/lib/libmpi.so]
2022-10-06T09:49:08 - fiat - INFO -     MPIEXEC                   : [/perm/nawd/conda/envs/fiatenv/bin/mpiexec]
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - Build summary
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - system : [xxxx] [Linux-4.18.0-305.19.1.el8_4.x86_64] [linux.64]
2022-10-06T09:49:08 - fiat - INFO - processor        : [x86_64]
2022-10-06T09:49:08 - fiat - INFO - endiness         : Little Endian -- IEEE []
2022-10-06T09:49:08 - fiat - INFO - build type       : [RelWithDebInfo]
2022-10-06T09:49:08 - fiat - INFO - timestamp        : [20221006094902]
2022-10-06T09:49:08 - fiat - INFO - install prefix   : [/usr/local]
2022-10-06T09:49:08 - fiat - INFO -   bin dir        : [/usr/local/bin]
2022-10-06T09:49:08 - fiat - INFO -   lib dir        : [/usr/local/lib64]
2022-10-06T09:49:08 - fiat - INFO -   include dir    : [/usr/local/include]
2022-10-06T09:49:08 - fiat - INFO -   data dir       : [/usr/local/share/fiat]
2022-10-06T09:49:08 - fiat - INFO -   cmake dir      : [/usr/local/lib64/cmake/fiat]
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - C -- GNU 11.2.0
2022-10-06T09:49:08 - fiat - INFO -     compiler   : /perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-cc
2022-10-06T09:49:08 - fiat - INFO -     flags      : -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include -pipe -Wall -Wextra -Wno-unused-parameter -Wno-unused-variable -Wno-gnu-zero-variadic-macro-arguments -Wno-deprecated-declarations -O2 -g -DNDEBUG  
2022-10-06T09:49:08 - fiat - INFO -     link flags : 
2022-10-06T09:49:08 - fiat - INFO - CXX -- GNU 11.2.0
2022-10-06T09:49:08 - fiat - INFO -     compiler   : /perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-c++
2022-10-06T09:49:08 - fiat - INFO -     flags      : -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include -pipe -O2 -g -DNDEBUG  
2022-10-06T09:49:08 - fiat - INFO -     link flags : 
2022-10-06T09:49:08 - fiat - INFO - Fortran -- GNU 11.2.0
2022-10-06T09:49:08 - fiat - INFO -     compiler   : /perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gfortran
2022-10-06T09:49:08 - fiat - INFO -     flags      : -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include -O2 -g -DNDEBUG  
2022-10-06T09:49:08 - fiat - INFO -     link flags : 
2022-10-06T09:49:08 - fiat - INFO - linker : /perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ld
2022-10-06T09:49:08 - fiat - INFO - ar     : /perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ar
2022-10-06T09:49:08 - fiat - INFO - ranlib : /perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ranlib
2022-10-06T09:49:08 - fiat - INFO - link flags
2022-10-06T09:49:08 - fiat - INFO -     executable [-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/perm/nawd/conda/envs/fiatenv/lib -Wl,-rpath-link,/perm/nawd/conda/envs/fiatenv/lib -L/perm/nawd/conda/envs/fiatenv/lib    -Wl,--disable-new-dtags ]
2022-10-06T09:49:08 - fiat - INFO -     shared lib [-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/perm/nawd/conda/envs/fiatenv/lib -Wl,-rpath-link,/perm/nawd/conda/envs/fiatenv/lib -L/perm/nawd/conda/envs/fiatenv/lib -Wl,--disable-new-dtags ]
2022-10-06T09:49:08 - fiat - INFO -     static lib [-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/perm/nawd/conda/envs/fiatenv/lib -Wl,-rpath-link,/perm/nawd/conda/envs/fiatenv/lib -L/perm/nawd/conda/envs/fiatenv/lib -Wl,--disable-new-dtags ]
2022-10-06T09:49:08 - fiat - INFO - install rpath  : $ORIGIN/../lib64
2022-10-06T09:49:08 - fiat - INFO - common definitions: 
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
2022-10-06T09:49:08 - fiat - INFO - Feature summary
2022-10-06T09:49:08 - fiat - INFO - ---------------------------------------------------------
rolfhm commented 2 years ago

Maybe it's an issue with how ecbuild searches for and set the linker?

wdeconinck commented 2 years ago

ecbuild itself is not searching for these, only diagnosing what CMake is using. Following variables are set in environment for me, and should be picked up by CMake, unless overridden on command line.


CC=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-cc

FC=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gfortran

CXX=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-c++

CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include

CXXFLAGS=-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include

FFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /perm/nawd/conda/envs/fiatenv/include

CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /perm/nawd/conda/envs/fiatenv/include

LDFLAGS=-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/perm/nawd/conda/envs/fiatenv/lib -Wl,-rpath-link,/perm/nawd/conda/envs/fiatenv/lib -L/perm/nawd/conda/envs/fiatenv/lib

LD=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ld

STRIP=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-strip

OBJCOPY=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-objcopy

RANLIB=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ranlib

OBJDUMP=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-objdump

AR=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ar

AS=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-as

READELF=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-readelf

GPROF=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gprof

ADDR2LINE=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-addr2line

CMAKE_PREFIX_PATH=/perm/nawd/conda/envs/fiatenv:/perm/nawd/conda/envs/fiatenv/x86_64-conda-linux-gnu/sysroot/usr

# I believe conda specifies this, but I did not pass this to the cmake command, so should not be necessary.
CMAKE_ARGS=-DCMAKE_AR=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/perm/nawd/conda/envs/fiatenv/bin/x86_64-conda-linux-gnu-strip
rolfhm commented 2 years ago

Downloading gcc_linux-64 from standard conda sources, i.e. not conda-forge seems to work better. However, the mpi package appears to be broken and can't be installed. I tried to compile with DENABLE_MPI=OFF, but it still fails in the linking, trying to find some library used by the built in mpi library.

wdeconinck commented 2 years ago

Could you do a clean build (wiping the entire build dir of fiat), and then show output of of "ecbuild.log" and output of make VERBOSE=1

rolfhm commented 2 years ago

The files turn out to be pretty big and this is an open repo. Can I send them on email?

wdeconinck commented 2 years ago

I received your email and diagnosed a bit the issue, which is only apparent for the executable fiat-printbinding. Latest commit https://github.com/ecmwf-ifs/fiat/commit/9398e04fd5d00887b9598151abfa31c388e605d5 should avoid the program fiat-printbinding to link with MPI when ENABLE_MPI=OFF That however does not fix the issue itself of a faulty conda-MPI installation. The issue seems that the rpath of some libraries which libmpi.so relies on are not encoded in libmpi.so:

/home/rolfhm/miniconda3/envs/trololo/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libopen-rte.so.20, needed by /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so, not found (try using -rpath or -rpath-link)
/home/rolfhm/miniconda3/envs/trololo/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libopen-pal.so.20, needed by /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so, not found (try using -rpath or -rpath-link)
/home/rolfhm/miniconda3/envs/trololo/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libhwloc.so.5, needed by /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so, not found (try using -rpath or -rpath-link)

That could perhaps be worked around by amending the LDFLAGS environment variable which currently reads for you:

LDFLAGS=-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/rolfhm/miniconda3/envs/trololo/lib -Wl,-rpath-link,/home/rolfhm/miniconda3/envs/trololo/lib -L/home/rolfhm/miniconda3/envs/trololo/lib

so that the library directories are added where libopen-rte.so.20 , libopen-pal.so.20 , libhwloc.so.5 are located.

wdeconinck commented 1 year ago

@rolfhm Can this issue be closed?

rolfhm commented 1 year ago

Yes. It turned out that PHYEX downloaded and installed its own version of fiat and it worked fine.