ComputationalRadiationPhysics / libSplash

libSplash - Simple Parallel file output Library for Accumulating Simulation data using Hdf5
GNU Lesser General Public License v3.0
15 stars 15 forks source link

libSplash not compiling on taurus #191

Closed PrometheusPi closed 9 years ago

PrometheusPi commented 9 years ago

I tried to compile libSplash master on taurus but failed.

I used:

provided by the module system.

I got a linking error:

[ 45%] Built target splash
[ 90%] Built target splash_static
[ 95%] Linking CXX executable splashtools
libsplash.a(ParallelDataCollector.cpp.o): In function `splash::ParallelDataCollector::setFileAccessParams(int&)':
ParallelDataCollector.cpp:(.text+0xc33): undefined reference to `H5P_CLS_FILE_ACCESS_ID_g'
ParallelDataCollector.cpp:(.text+0xc49): undefined reference to `H5Pset_fapl_mpio'
libsplash.a(ParallelDataCollector.cpp.o): In function `splash::ParallelDataCollector::append(int, splash::Dimensions, unsigned int, splash::Dimensions, char const*, void const*)':
ParallelDataCollector.cpp:(.text+0x1a50): undefined reference to `H5P_CLS_DATASET_XFER_ID_g'
ParallelDataCollector.cpp:(.text+0x1a68): undefined reference to `H5Pset_dxpl_mpio'
ParallelDataCollector.cpp:(.text+0x1a73): undefined reference to `H5P_CLS_DATASET_XFER_ID_g'
ParallelDataCollector.cpp:(.text+0x1a8b): undefined reference to `H5Pset_dxpl_mpio'
ParallelDataCollector.cpp:(.text+0x1aba): undefined reference to `H5Pset_dxpl_mpio'
libsplash.a(ParallelDataCollector.cpp.o): In function `splash::ParallelDataCollector::remove(int, char const*)':
ParallelDataCollector.cpp:(.text+0x2480): undefined reference to `H5P_LST_LINK_ACCESS_ID_g'
libsplash.a(ParallelDataCollector.cpp.o): In function `splash::ParallelDataCollector::createReference(int, char const*, int, char const*)':
...

This might by a bug on taurus and how hdf5 is buld there. I will compile hdf5 on my own to test this.

Any other ideas why the linking failed?

ax3l commented 9 years ago

Thanks for the report. Do you have the same problem with the dev version?

PrometheusPi commented 9 years ago

Thanks for the fast reply. I will try that now.

ax3l commented 9 years ago

are you loading bullxmpi and openmpi ?

ax3l commented 9 years ago

can't reproduce with my module set on master:

  1) oscar-modules/1.0.3   4) cuda/6.5.14           7) gcc/4.8.0
  2) cmake/2.8.11          5) bullxmpi/def          8) python/2.7
  3) git/1.9.0             6) gnuplot/4.6.1         9) boost/1.55.0-gnu4.8
PrometheusPi commented 9 years ago

Yes, I did. but I now checked it with only bullxmpiand it still fails.

PrometheusPi commented 9 years ago

I only use a newer cmake and added zlib as well as hdf5/1.8.14. Other than that, I use the same modules.

Currently Loaded Modulefiles:
  1) oscar-modules/1.0.3   4) cuda/6.5.14           7) zlib/1.2.8           10) python/2.7
  2) cmake/3.3.1           5) bullxmpi/def          8) hdf5/1.8.14          11) boost/1.55.0-gnu4.8
  3) git/1.9.0             6) gnuplot/4.6.1         9) gcc/4.8.0
ax3l commented 9 years ago

more tests:

Currently Loaded Modulefiles:
  1) oscar-modules/1.0.3   4) cuda/6.5.14           7) gcc/4.8.0            10) zlib/1.2.8
  2) cmake/3.3.1           5) bullxmpi/def          8) python/2.7
  3) git/1.9.0             6) gnuplot/4.6.1         9) boost/1.55.0-gnu4.8
PrometheusPi commented 9 years ago

So it looks like an issue with hdf5/1.8.14. What version of hdf5 have you compiled locally?

ax3l commented 9 years ago

I did build 1.8.11 - looks like their module is lacking files.

ax3l commented 9 years ago

actually, our picongpu.profile.example does not use the module from taurus due to that reason ;)

ax3l commented 9 years ago

Might be that they forgot to compile with -fPIC on compiled static libs.

PrometheusPi commented 9 years ago

I knew there was an issue a year ago and hoped they fixed it in the mean time :) I will build my own hdf5 and will provide feedback as soon as I am done.

ax3l commented 9 years ago

ah no, taurus is one of the few systems that installed both static and shared hdf5 libs. this confuses our CMakeLists.txt a little.

ax3l commented 9 years ago

ah and an other problem is, that they also have an additional libhdf5.so in their system paths...

and besides selecting the right (absolute) path for hdf5, it finally links against -lhdf5 instead linking against the absolute path, which selects the wrong hdf5 version

[ 45%] Linking CXX shared library libsplash.so
/sw/global/tools/cmake/3.3.1/bin/cmake -E cmake_link_script CMakeFiles/splash.dir/link.txt --verbose=1
/sw/global/compilers/gcc/4.8.0/bin/g++  -fPIC  -Wall -Werror -Wextra -Woverloaded-virtual -O3 -DNDEBUG  -shared -Wl,-soname,libsplash.so -o libsplash.so CMakeFiles/splash.dir/src/logging.cpp.o [...] ParallelDomainCollector.cpp.o -lz -lhdf5 -lz -lrt -lm [...]
PrometheusPi commented 9 years ago

By compiling the current HDF5-1.8.15 Patch 1 (available via hdf5group) myself, I could build libSplash.

My only problem was, that ./configure did not take the --prefix option and I am not yet sure if it took the other options --enable-parallel --enable-shared.

ax3l commented 9 years ago

debugged the problem to the core now.

taurus sets not only the LD_LIBRARY_PATH (run-time, execution time) but also the LIBRARY_PATH (link-time). As written in this excellent mailing list history, libraries in LIBRARY_PATH are interpreted as system directories by gcc and reported as such to cmake. CMake now shortens libraries with absolute paths to short-hand paths, assuming these are the same (e.g., the linker would be from find_packages set to /sw/taurus/libraries/hdf5/1.8.14/lib/libhdf5.so but now uses -lhdf5).

Now the inconsistency for this "feature" kicks in: the imports are still correct to the find_package library, but -lhdf5 will prefer the actual system paths over the LIBRARY_PATH. (Update 2018: the latter is likely caused by inconsistent sysroot configuration of the compiler/linker and/or handling in CMake [1] [2].)

There are at least three solutions for that

a) (picongpu) after loading your modules on taurus, put in the last line unset LIBRARY_PATH, it is superflourus b) (splash) we can set a target property (IMPORTED_LOCATION) but I am not a fan of introducing work-arounds for mal-configured environments c) (splash) we can generally set unset(ENV{LIBRARY_PATH}) in our CMakeLists.txt to disable this feature (and hope the libs are still found over the other available hints such as CMAKE_PREFIX_PATH, run-time paths, LD_LIBRARY_PATH, actual system paths, etc.). imho, this will be totally fine, too. d) write the taurus support and tell them using LIBRARY_PATH is not cool (but they use CPLUS_INCLUDE_PATH and similar stuff, too.)

PrometheusPi commented 9 years ago

@ax3l Would you recommend solution (a) over using a self-compiled hdf5?

ax3l commented 9 years ago

yes, will update the module in picongpu myself.

already wrote the support about a)/d) und c) should not be necessary since "normal systems" (desktops) and other HPC systems do not use this variable regularly.

ax3l commented 8 years ago

update: the taurus support answered and they update the modules to avoid setting LIBRARY_PATH (and similar ones such as CPLUS_INCLUDE_PATH), too.

the updated profile from https://github.com/ComputationalRadiationPhysics/picongpu/pull/1116 still unsets the var which is fine and compatible.