Open mxmlnkn opened 5 years ago
This is almost certainly related to the conduit (external) library, or the hdf5 library used in conduit. As I recall Tom's superbuild includes the zlib someplace -- though I can't find it at the moment.
The exec you mention will go away soon, as it's no longer used. However, you'll still need zlib if you're building with conduit.
From: Maximilian K. notifications@github.com Sent: Tuesday, February 12, 2019 3:27:11 AM To: LLNL/lbann Cc: Subscribed Subject: [LLNL/lbann] Undefined reference to crc32 (#874)
I tried to install LBANN without a container and for some reason I got a linking error at this point of make VERBOSE=1:
[ 92%] Linking CXX executable jag_converter cd $HOME/lbann-0.98.1/build/src/data_store && /software/ml/CMake/3.10.2-GCCcore-6.4.0/bin/cmake -E cmake_link_script CMakeFiles/jag_converter-bin.dir/link.txt --verbose=1 g++ -fPIC -g -Wall -Wextra -Wno-unused-parameter -Wnon-virtual-dtor -Wshadow -g -O0 CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o -o jag_converter -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:$HOME/Elemental/lib:$HOME/opencv/lib64:$HOME/lbann-0.98.1/build/external/TBinf:$HOME/cnpy/lib:$HOME/Aluminum/lib64 ../../liblbann.so ../proto/libLbannProto.so $HOME/Elemental/lib/libHydrogen.so /opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcublas.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcublas_device.a /usr/lib64/libcuda.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/stubs/libnvidia-ml.so /opt/cuDNN/7.1.4.18-fosscuda-2018b/lib/libcudnn.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libnvToolsExt.so $HOME/opencv/lib64/libopencv_highgui.so.3.4.3 $HOME/opencv/lib64/libopencv_imgcodecs.so.3.4.3 $HOME/opencv/lib64/libopencv_objdetect.so.3.4.3 $HOME/opencv/lib64/libopencv_photo.so.3.4.3 $HOME/opencv/lib64/libopencv_imgproc.so.3.4.3 $HOME/opencv/lib64/libopencv_core.so.3.4.3 ../../external/TBinf/libTBinf.so $HOME/protobuf/lib64/libprotobuf.a $HOME/cnpy/lib/libcnpy.so $HOME/Aluminum/lib64/libAl.so /opt/GCCcore/7.3.0/lib64/libgomp.so -lpthread /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi.so /opt/hwloc/1.11.10-GCCcore-7.3.0/lib/libhwloc.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcudart_static.a -lpthread -ldl /usr/lib64/librt.so /usr/lib64/libdl.so ../../liblbann.so: error: undefined reference to 'crc32' collect2: error: ld returned 1 exit status make[2]: [src/data_store/jag_converter] Error 1 make[2]: Leaving directory `$HOME/lbann-ml/lbann-0.98.1/build' make[1]: [src/data_store/CMakeFiles/jag_converter-bin.dir/all] Error 2 make[1]: Leaving directory `$HOME/lbann-ml/lbann-0.98.1/build' make: *** [all] Error 2
The problem might stem from the linked CNPY: rogersce/cnpy#13https://github.com/rogersce/cnpy/issues/13 Manually appending -lz to the above g++ call, solves the problem for me. I guess somewhere in the CMakeLists.txt of LBANN there should be ZLIB_LIBRARIES appended to the target_link step? No idea why I didn't encounter the problem before. I'm using CMake 3.10.2 instead of 3.14rc1. So, that might be an influence worth noting.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/lbann/issues/874, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI8DH6SexpM5PZU4mZdZRXPbRYLicq-6ks5vMqUPgaJpZM4a2Lvm.
Yes, we could add ZLIB_LIBRARIES
to our CMake but this is sorta kinda wrong. Absolutely nothing in LBANN requires ZLIB -- it is a dependency of our dependency and should be handled by the upstream. Open a bug with CONDUIT and tell them to actually rebuild their export properly. (Just be prepared for a fairly scripted response along the lines of "our downstreams usually handle this and it's not a problem if you use Spack".)
I was mostly responding based on @davidHysom's comment. If the issue is determined to be CNPY and not CONDUIT, we build that import so it can be fixed there if needed.
I'm pretty sure I don't have conduit installed, see also the CMake log below.
-- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /opt/GCCcore/7.3.0/bin/g++
-- Check for working CXX compiler: /opt/GCCcore/7.3.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
-- Performing Test FLAG__fPIC_OK
-- Performing Test FLAG__fPIC_OK - Success
-- Performing Test FLAG__g_OK
-- Performing Test FLAG__g_OK - Success
-- Performing Test FLAG__Wall_OK
-- Performing Test FLAG__Wall_OK - Success
-- Performing Test FLAG__Wextra_OK
-- Performing Test FLAG__Wextra_OK - Success
-- Performing Test FLAG__Wno_unused_parameter_OK
-- Performing Test FLAG__Wno_unused_parameter_OK - Success
-- Performing Test FLAG__Wnon_virtual_dtor_OK
-- Performing Test FLAG__Wnon_virtual_dtor_OK - Success
-- Performing Test FLAG__Wshadow_OK
-- Performing Test FLAG__Wshadow_OK - Success
-- Performing Test FLAG__O0_OK
-- Performing Test FLAG__O0_OK - Success
-- Looking for C++ include sys/sendfile.h
-- Looking for C++ include sys/sendfile.h - found
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5") found components: CXX
-- Performing Test _OPENMP_TEST_COMPILES
-- Performing Test _OPENMP_TEST_COMPILES - Success
-- Performing Test EL_HAVE_OMP_COLLAPSE
-- Performing Test EL_HAVE_OMP_COLLAPSE - Success
-- Performing Test EL_HAVE_OMP_SIMD
-- Performing Test EL_HAVE_OMP_SIMD - Success
-- Found MPI_CXX: /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so (found suitable version "3.1", minimum required is "3.0")
-- Found MPI: TRUE (found suitable version "3.1", minimum required is "3.0") found components: CXX
-- Performing Test HYDROGEN_MPI_IS_OPENMPI
-- Performing Test HYDROGEN_MPI_IS_OPENMPI - Failed
-- Performing Test HYDROGEN_MPI_IS_MVAPICH2
-- Performing Test HYDROGEN_MPI_IS_MVAPICH2 - Failed
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found HWLOC: /opt/hwloc/1.11.10-GCCcore-7.3.0/lib/libhwloc.so
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /opt/CUDA/9.2.88-GCC-7.3.0-2.30 (found version "9.2")
-- Found NVML: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/stubs/libnvidia-ml.so
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- A library with BLAS API found.
-- Looking for cheev_
-- Looking for cheev_ - found
-- A library with LAPACK API found.
-- Found HydrogenLAPACK: TRUE
-- Found LAPACK: /opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so;/opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so
-- Looking for dgemm
-- Looking for dgemm - not found
-- Looking for dgemm_
-- Looking for dgemm_ - found
-- Looking for dlacpy
-- Looking for dlacpy - not found
-- Looking for dlacpy_
-- Looking for dlacpy_ - found
-- Using BLAS with trailing underscore.
-- Using LAPACK with trailing underscore.
-- Looking for mkl_dcsrmv
-- Looking for mkl_dcsrmv - not found
-- Found Hydrogen: $HOME/lbann-ml/Elemental/lib/cmake/hydrogen
-- Found Protobuf: $HOME/lbann-ml/protobuf/lib64/libprotobuf.a;-lpthread (found suitable version "3.6.1", minimum required is "3.0.0")
-- Found OpenCV: $HOME/lbann-ml/opencv/share/OpenCV
-- The CUDA compiler identification is NVIDIA 9.2.148
-- Check for working CUDA compiler: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/bin/nvcc
-- Check for working CUDA compiler: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Aluminum: $HOME/lbann-ml/Aluminum/lib64/cmake/aluminum
-- Found NVTX: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libnvToolsExt.so
-- Found cuDNN: /opt/cuDNN/7.1.4.18-fosscuda-2018b/lib/libcudnn.so
-- Found dl: /usr/lib64/libdl.so
-- Found CNPY: $HOME/lbann-ml/cnpy/lib/libcnpy.so
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Could NOT find SPHINX (missing: SPHINX_EXECUTABLE)
== LBANN Configuration Summary ==
PROJECT_SOURCE_DIR: $HOME/lbann-ml/lbann-0.98.1
PROJECT_BINARY_DIR: $HOME/lbann-ml/lbann-0.98.1/build
CMAKE_INSTALL_PREFIX: $HOME/lbann-ml/lbann
CMAKE_BUILD_TYPE: Debug
CXX FLAGS: -fPIC -g -Wall -Wextra -Wno-unused-parameter -Wnon-virtual-dtor -Wshadow -g -O0
LBANN_GNU_LINUX: TRUE
LBANN_HAS_HYDROGEN: TRUE
LBANN_HAS_OPENCV: TRUE
LBANN_HAS_CEREAL: TRUE
LBANN_HAS_CUDA: TRUE
LBANN_HAS_CUDNN: TRUE
LBANN_HAS_NCCL2: FALSE
LBANN_HAS_PROTOBUF: TRUE
LBANN_HAS_CNPY: TRUE
LBANN_HAS_TBINF: TRUE
LBANN_HAS_VTUNE: FALSE
LBANN_NVPROF: FALSE
LBANN_HAS_DOXYGEN: FALSE
LBANN_HAS_LBANN_PROTO: TRUE
LBANN_HAS_ALUMINUM: TRUE
LBANN_HAS_CONDUIT: FALSE
== End LBANN Configuration Summary ==
-- Configuring done
-- Generating done
-- Build files have been written to:
Btw, there must be a better way to do the git repository check without getting a weird error message on the CMake call. The problem seems to be this call:
execute_process(
COMMAND ${__GIT_EXECUTABLE} rev-parse --is-inside-work-tree
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}"
OUTPUT_VARIABLE __BUILDING_FROM_GIT_SOURCES
OUTPUT_STRIP_TRAILING_WHITESPACE)
I think specifying the option ERROR_QUIET
to execute_process
should be enough.
So, I took a look at the command line for the build where I have this problem vs. the other system where I do not have that problem. The most promising difference seems to be that for some some reason, the non-working build uses libprotobuf.a
while the working build uses libprotobuf.so
. For the non-working ccmake shows Protobuf_INCLUDE_DIR=$HOME/lbann-ml/protobuf/include
and Protobuf_LIBRARY_DEBUG=$HOME/lbann-ml/protobuf/lib64/libprotobuf.a
, i.e., it uses my self-build protobuf statically which comes with a ProtobufConfig.cmake while the working build actually has Protobuf_INCLUDE_DIR=/usr/local/include
and Protobuf_LIBRARY_DEBUG=/usr/local/lib/libprotobuf.so
, i.e., it uses the system-installed package shared library.
Another difference is NCCL but I think I also had the same bug when I still had NCCL turned off. Even though the diff shows libcuda.so
being non-present in the working build, both builds are compiled with CUDA turned on and both link libcudart_static.a
; so, that's also a bit weird but that might be the difference between CMake 3.10.2 (for the non-working build) vs. CMake 3.14rc1 (for the working build).
The full diff:
diff lbann-libz-bug-{non,}working.log
1c1
< cd $HOME/lbann-0.98.1/build/src/data_store && /software/ml/CMake/3.10.2-GCCcore-6.4.0/bin/cmake\
---
> cd $HOME/lbann-0.98.1/build/src/data_store && /opt/cmake/bin/cmake\
4c4
< g++\
---
> /usr/bin/c++ \
12,13c12,13
< -g\
< -O0 CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o \
---
> -O3\
> -DNDEBUG CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o \
15c15
< -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:$HOME/Elemental/lib:$HOME/opencv/lib64:$HOME/lbann-0.98.1/build/external/TBinf:$HOME/cnpy/lib:$HOME/Aluminum/lib64 liblbann.so \
---
> -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:/opt/Elemental/lib:/usr/local/cuda/lib64:/opt/opencv/lib:$HOME/lbann-0.98.1/build/external/TBinf:/usr/local/lib:/opt/cnpy/lib:/opt/Aluminum/lib:/usr/lib/x86_64-linux-gnu/openmpi/lib:/opt/hwloc/lib liblbann.so \
20,21d19
< libcublas_device.a \
< libcuda.so \
32c30
< libprotobuf.a \
---
> libprotobuf.so \
36c34,36
< -lpthread libmpi_cxx.so \
---
> -lpthread\
> -L/usr/lib\
> -pthread libmpi_cxx.so \
43a44
> libnccl.so \
Ok, so I recompiled protobuf this time with adding the CMake option -Dprotobuf_BUILD_SHARED_LIBS=ON
and rebuild LBANN. The problem still persists. Seems like that was not the problem.
I tried to install LBANN without a container and for some reason I got a linking error at this point of
make VERBOSE=1
:The problem might stem from the linked CNPY: https://github.com/rogersce/cnpy/issues/13 Manually appending
-lz
to the above g++ call, solves the problem for me. I guess somewhere in the CMakeLists.txt of LBANN there should be ZLIB_LIBRARIES appended to the target_link step? No idea why I didn't encounter the problem before. I'm using CMake 3.10.2 instead of 3.14rc1. So, that might be an influence worth noting. Automated approach: