anharmonic / d3q

D3Q + thermal2
Other
20 stars 12 forks source link

Problem in linking libraries `minpack/distributed` #15

Open Crivella opened 4 months ago

Crivella commented 4 months ago

I guess this is related to the comment by @paulatz in #13

When compiling with QE cmake, I end up incurring the following for several executables:

[ 98%] Linking Fortran executable ../../bin/d3_interpolate2.x
gmake[2]: *** [external/d3q/CMakeFiles/qe_d3_recenter_exe.dir/build.make:126: bin/d3_recenter.x] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:3121: external/d3q/CMakeFiles/qe_d3_recenter_exe.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
ld: libqe_thermal2.so: undefined reference to `plmdif'
collect2: error: ld returned 1 exit status

I checked and saw that this function is defined in minpack/distributed/plmdif.c. After adding it to the same section in CMakeLists.txt as the minpack/lapackified functions i got other undefined references, so i added all .c files to the section but than encountered this

[ 98%] Linking Fortran shared library libqe_thermal2.so                                                                                                                                                                                                  
/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: CMakeFiles/qe_thermal2.dir/minpack/distributed/enorm.c.o: in function `enorm_':                                                                                            
enorm.c:(.text+0x10): multiple definition of `enorm_'; CMakeFiles/qe_thermal2.dir/minpack/lapackified/enorm.c.o:enorm.c:(.text+0x20): first defined here                                                                                                 
/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: warning: CMakeFiles/qe_thermal2.dir/thermal2/eos.f90.o: requires executable stack (because the .note.GNU-stack section is executable)                                      
collect2: error: ld returned 1 exit status 

So i tried to remove the enorm.c from distributed but than encountered undefined errors still leading to 2 undefined references

[100%] Linking Fortran executable ../../bin/d3_sparse.x                                                                                                                                                                                                  
/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: libqe_thermal2.so: undefined reference to `enorm'                                                                                                                          
collect2: error: ld returned 1 exit status                                                                                                                                                                                                               
make[3]: *** [external/d3q/CMakeFiles/qe_d3_qq2rr_exe.dir/build.make:126: bin/d3_qq2rr.x] Error 1                                                                                                                                                        
make[2]: *** [CMakeFiles/Makefile2:3035: external/d3q/CMakeFiles/qe_d3_qq2rr_exe.dir/all] Error 2 
[100%] Linking Fortran executable ../../bin/d3_qha.x                                                                                                                                                                                                     
/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: CMakeFiles/qe_d3_qha_exe.dir/thermal2/PROGRAM_qha.f90.o: undefined reference to symbol 'remove_stack_limit_'                                                               
/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: /home/crivella/Codes/QE/7.3-cmake/_build/Modules/libqe_modules_c.so: error adding symbols: DSO missing from command line 

EDIT: The remove_stack_limit_ one should be related to a separate non d3q specific problem, but the enorm one still blocks the compilation.


EDIT2: Actually not necessarily, I am seeing that in QE, calls to remove_stack_limit are enabled at the precompiler level only when using an intel compiler

#if defined(__INTEL_COMPILER)
  CALL remove_stack_limit ( )
#endif

while here (for example https://github.com/anharmonic/d3q/blob/4ee824e209b52702afb40771e8e25663be1811f5/thermal2/PROGRAM_db.f90#L324) they are not. So this might be the cause for at least this specific error


EDIT3: NVM i think this is a QE problem (see https://gitlab.com/QEF/q-e/-/issues/667), but it might still be worth looking into whether this function should only be called when running with intel or not


If i instead remove the one from lapackified i encounter this

/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: CMakeFiles/qe_d3_qha_exe.dir/thermal2/PROGRAM_qha.f90.o: undefined reference to symbol 'remove_stack_limit_'                                                               
/home/crivella/.local/easybuild/software/binutils/2.40-GCCcore-12.3.0/bin/ld: /home/crivella/Codes/QE/7.3-cmake/_build/Modules/libqe_modules_c.so: error adding symbols: DSO missing from command line                                                   
collect2: error: ld returned 1 exit status

Kinda stuck after this

ye-luo commented 4 months ago

I didn't see this issue in my usual way of building QE. I saw libqe_thermal2.so are you building libraries as shared? Then the failure is probably related to whether the linker should ignore unresolved symbols in shared libraries.

Crivella commented 4 months ago

Yes i am trying to get a working build with shared libraries. This should be related to https://gitlab.com/QEF/q-e/-/issues/667 which i opened today.

I will try using -Wl,--copy-dt-needed-entries and recompile d3q and get back at you

Crivella commented 4 months ago

I tried rebuilding from the latest commit 5449bf1f,

I can confirm that the static build works (tested compilation still need to run the code itslef).

The one with shared libs still gives the error:

ld: libqe_thermal2.so: undefined reference to `plmdif'

By repeating the step mentioned previously (manually modifing CMakeLists) i now get errors related to include files (I assume this is a WIP given the discussion in #13).

Using -Wl,--copy-dt-needed-entries should resolve the DSO errors if the symbol is inside a dependency of the library, but i am not sure if the undefined reference can/should be ignored.

Crivella commented 4 months ago

As a reference here is a minimal version of the build script i am running

#!/usr/bin/env bash

export LDFLAGS="-Wl,--copy-dt-needed-entries"
BSHARED=ON

cmake -S. -B_build \
    -DCMAKE_VERBOSE_MAKEFILE=ON \
    -DENABLE_OPENMP=OFF \
    -DENABLE_MPI=ON \
    -DCMAKE_C_COMPILER=mpiicc \
    -DCMAKE_Fortran_COMPILER=mpiifort \
    -DBUILD_SHARED_LIBS=$BSHARED \
    -DQE_ENABLE_SCALAPACK=ON \
    -DQE_ENABLE_HDF5=ON \
    -DQE_ENABLE_PLUGINS="d3q"

cmake --build _build --parallel 32