LLNL / lbann

Livermore Big Artificial Neural Network Toolkit
http://software.llnl.gov/lbann/
Other
225 stars 79 forks source link

Undefined reference to crc32 #874

Open mxmlnkn opened 5 years ago

mxmlnkn commented 5 years ago

I tried to install LBANN without a container and for some reason I got a linking error at this point of make VERBOSE=1:

[ 92%] Linking CXX executable jag_converter
cd $HOME/lbann-0.98.1/build/src/data_store && /software/ml/CMake/3.10.2-GCCcore-6.4.0/bin/cmake -E cmake_link_script CMakeFiles/jag_converter-bin.dir/link.txt --verbose=1
g++ -fPIC -g -Wall -Wextra -Wno-unused-parameter -Wnon-virtual-dtor -Wshadow -g -O0   CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o  -o jag_converter -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:$HOME/Elemental/lib:$HOME/opencv/lib64:$HOME/lbann-0.98.1/build/external/TBinf:$HOME/cnpy/lib:$HOME/Aluminum/lib64 ../../liblbann.so ../proto/libLbannProto.so $HOME/Elemental/lib/libHydrogen.so /opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcublas.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcublas_device.a /usr/lib64/libcuda.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/stubs/libnvidia-ml.so /opt/cuDNN/7.1.4.18-fosscuda-2018b/lib/libcudnn.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libnvToolsExt.so $HOME/opencv/lib64/libopencv_highgui.so.3.4.3 $HOME/opencv/lib64/libopencv_imgcodecs.so.3.4.3 $HOME/opencv/lib64/libopencv_objdetect.so.3.4.3 $HOME/opencv/lib64/libopencv_photo.so.3.4.3 $HOME/opencv/lib64/libopencv_imgproc.so.3.4.3 $HOME/opencv/lib64/libopencv_core.so.3.4.3 ../../external/TBinf/libTBinf.so $HOME/protobuf/lib64/libprotobuf.a $HOME/cnpy/lib/libcnpy.so $HOME/Aluminum/lib64/libAl.so /opt/GCCcore/7.3.0/lib64/libgomp.so -lpthread /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi.so /opt/hwloc/1.11.10-GCCcore-7.3.0/lib/libhwloc.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcudart_static.a -lpthread -ldl /usr/lib64/librt.so /usr/lib64/libdl.so
../../liblbann.so: error: undefined reference to 'crc32'
collect2: error: ld returned 1 exit status
make[2]: *** [src/data_store/jag_converter] Error 1
make[2]: Leaving directory `$HOME/lbann-ml/lbann-0.98.1/build'
make[1]: *** [src/data_store/CMakeFiles/jag_converter-bin.dir/all] Error 2
make[1]: Leaving directory `$HOME/lbann-ml/lbann-0.98.1/build'
make: *** [all] Error 2

The problem might stem from the linked CNPY: https://github.com/rogersce/cnpy/issues/13 Manually appending -lz to the above g++ call, solves the problem for me. I guess somewhere in the CMakeLists.txt of LBANN there should be ZLIB_LIBRARIES appended to the target_link step? No idea why I didn't encounter the problem before. I'm using CMake 3.10.2 instead of 3.14rc1. So, that might be an influence worth noting. Automated approach:

cmake ..
find . -type f -execdir  bash -c 'if grep 'g++.*libcnpy\.so' "$0" | grep -q -v " -lz"; then sed -i -r "/g\+\+ .*libcnpy\.so( |$)/{ s:(libcnpy\.so |$):\1-lz : }" "$0"; fi' {} \;
make -j $( nproc ) install
davidHysom commented 5 years ago

This is almost certainly related to the conduit (external) library, or the hdf5 library used in conduit. As I recall Tom's superbuild includes the zlib someplace -- though I can't find it at the moment.

The exec you mention will go away soon, as it's no longer used. However, you'll still need zlib if you're building with conduit.


From: Maximilian K. notifications@github.com Sent: Tuesday, February 12, 2019 3:27:11 AM To: LLNL/lbann Cc: Subscribed Subject: [LLNL/lbann] Undefined reference to crc32 (#874)

I tried to install LBANN without a container and for some reason I got a linking error at this point of make VERBOSE=1:

[ 92%] Linking CXX executable jag_converter cd $HOME/lbann-0.98.1/build/src/data_store && /software/ml/CMake/3.10.2-GCCcore-6.4.0/bin/cmake -E cmake_link_script CMakeFiles/jag_converter-bin.dir/link.txt --verbose=1 g++ -fPIC -g -Wall -Wextra -Wno-unused-parameter -Wnon-virtual-dtor -Wshadow -g -O0 CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o -o jag_converter -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:$HOME/Elemental/lib:$HOME/opencv/lib64:$HOME/lbann-0.98.1/build/external/TBinf:$HOME/cnpy/lib:$HOME/Aluminum/lib64 ../../liblbann.so ../proto/libLbannProto.so $HOME/Elemental/lib/libHydrogen.so /opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcublas.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcublas_device.a /usr/lib64/libcuda.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/stubs/libnvidia-ml.so /opt/cuDNN/7.1.4.18-fosscuda-2018b/lib/libcudnn.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libnvToolsExt.so $HOME/opencv/lib64/libopencv_highgui.so.3.4.3 $HOME/opencv/lib64/libopencv_imgcodecs.so.3.4.3 $HOME/opencv/lib64/libopencv_objdetect.so.3.4.3 $HOME/opencv/lib64/libopencv_photo.so.3.4.3 $HOME/opencv/lib64/libopencv_imgproc.so.3.4.3 $HOME/opencv/lib64/libopencv_core.so.3.4.3 ../../external/TBinf/libTBinf.so $HOME/protobuf/lib64/libprotobuf.a $HOME/cnpy/lib/libcnpy.so $HOME/Aluminum/lib64/libAl.so /opt/GCCcore/7.3.0/lib64/libgomp.so -lpthread /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi.so /opt/hwloc/1.11.10-GCCcore-7.3.0/lib/libhwloc.so /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libcudart_static.a -lpthread -ldl /usr/lib64/librt.so /usr/lib64/libdl.so ../../liblbann.so: error: undefined reference to 'crc32' collect2: error: ld returned 1 exit status make[2]: [src/data_store/jag_converter] Error 1 make[2]: Leaving directory `$HOME/lbann-ml/lbann-0.98.1/build' make[1]: [src/data_store/CMakeFiles/jag_converter-bin.dir/all] Error 2 make[1]: Leaving directory `$HOME/lbann-ml/lbann-0.98.1/build' make: *** [all] Error 2

The problem might stem from the linked CNPY: rogersce/cnpy#13https://github.com/rogersce/cnpy/issues/13 Manually appending -lz to the above g++ call, solves the problem for me. I guess somewhere in the CMakeLists.txt of LBANN there should be ZLIB_LIBRARIES appended to the target_link step? No idea why I didn't encounter the problem before. I'm using CMake 3.10.2 instead of 3.14rc1. So, that might be an influence worth noting.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/lbann/issues/874, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI8DH6SexpM5PZU4mZdZRXPbRYLicq-6ks5vMqUPgaJpZM4a2Lvm.

benson31 commented 5 years ago

Yes, we could add ZLIB_LIBRARIES to our CMake but this is sorta kinda wrong. Absolutely nothing in LBANN requires ZLIB -- it is a dependency of our dependency and should be handled by the upstream. Open a bug with CONDUIT and tell them to actually rebuild their export properly. (Just be prepared for a fairly scripted response along the lines of "our downstreams usually handle this and it's not a problem if you use Spack".)

benson31 commented 5 years ago

I was mostly responding based on @davidHysom's comment. If the issue is determined to be CNPY and not CONDUIT, we build that import so it can be fixed there if needed.

mxmlnkn commented 5 years ago

I'm pretty sure I don't have conduit installed, see also the CMake log below.

-- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /opt/GCCcore/7.3.0/bin/g++
-- Check for working CXX compiler: /opt/GCCcore/7.3.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
-- Performing Test FLAG__fPIC_OK
-- Performing Test FLAG__fPIC_OK - Success
-- Performing Test FLAG__g_OK
-- Performing Test FLAG__g_OK - Success
-- Performing Test FLAG__Wall_OK
-- Performing Test FLAG__Wall_OK - Success
-- Performing Test FLAG__Wextra_OK
-- Performing Test FLAG__Wextra_OK - Success
-- Performing Test FLAG__Wno_unused_parameter_OK
-- Performing Test FLAG__Wno_unused_parameter_OK - Success
-- Performing Test FLAG__Wnon_virtual_dtor_OK
-- Performing Test FLAG__Wnon_virtual_dtor_OK - Success
-- Performing Test FLAG__Wshadow_OK
-- Performing Test FLAG__Wshadow_OK - Success
-- Performing Test FLAG__O0_OK
-- Performing Test FLAG__O0_OK - Success
-- Looking for C++ include sys/sendfile.h
-- Looking for C++ include sys/sendfile.h - found
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5") found components:  CXX 
-- Performing Test _OPENMP_TEST_COMPILES
-- Performing Test _OPENMP_TEST_COMPILES - Success
-- Performing Test EL_HAVE_OMP_COLLAPSE
-- Performing Test EL_HAVE_OMP_COLLAPSE - Success
-- Performing Test EL_HAVE_OMP_SIMD
-- Performing Test EL_HAVE_OMP_SIMD - Success
-- Found MPI_CXX: /opt/OpenMPI/3.1.1-gcccuda-2018b/lib/libmpi_cxx.so (found suitable version "3.1", minimum required is "3.0") 
-- Found MPI: TRUE (found suitable version "3.1", minimum required is "3.0") found components:  CXX 
-- Performing Test HYDROGEN_MPI_IS_OPENMPI
-- Performing Test HYDROGEN_MPI_IS_OPENMPI - Failed
-- Performing Test HYDROGEN_MPI_IS_MVAPICH2
-- Performing Test HYDROGEN_MPI_IS_MVAPICH2 - Failed
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found HWLOC: /opt/hwloc/1.11.10-GCCcore-7.3.0/lib/libhwloc.so  
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /opt/CUDA/9.2.88-GCC-7.3.0-2.30 (found version "9.2") 
-- Found NVML: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/stubs/libnvidia-ml.so  
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- A library with BLAS API found.
-- Looking for cheev_
-- Looking for cheev_ - found
-- A library with LAPACK API found.
-- Found HydrogenLAPACK: TRUE  
-- Found LAPACK: /opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so;/opt/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so
-- Looking for dgemm
-- Looking for dgemm - not found
-- Looking for dgemm_
-- Looking for dgemm_ - found
-- Looking for dlacpy
-- Looking for dlacpy - not found
-- Looking for dlacpy_
-- Looking for dlacpy_ - found
-- Using BLAS with trailing underscore.
-- Using LAPACK with trailing underscore.
-- Looking for mkl_dcsrmv
-- Looking for mkl_dcsrmv - not found
-- Found Hydrogen: $HOME/lbann-ml/Elemental/lib/cmake/hydrogen
-- Found Protobuf: $HOME/lbann-ml/protobuf/lib64/libprotobuf.a;-lpthread (found suitable version "3.6.1", minimum required is "3.0.0") 
-- Found OpenCV: $HOME/lbann-ml/opencv/share/OpenCV
-- The CUDA compiler identification is NVIDIA 9.2.148
-- Check for working CUDA compiler: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/bin/nvcc
-- Check for working CUDA compiler: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Aluminum: $HOME/lbann-ml/Aluminum/lib64/cmake/aluminum
-- Found NVTX: /opt/CUDA/9.2.88-GCC-7.3.0-2.30/lib64/libnvToolsExt.so  
-- Found cuDNN: /opt/cuDNN/7.1.4.18-fosscuda-2018b/lib/libcudnn.so  
-- Found dl: /usr/lib64/libdl.so
-- Found CNPY: $HOME/lbann-ml/cnpy/lib/libcnpy.so  
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Could NOT find SPHINX (missing: SPHINX_EXECUTABLE) 

== LBANN Configuration Summary ==

  PROJECT_SOURCE_DIR:   $HOME/lbann-ml/lbann-0.98.1
  PROJECT_BINARY_DIR:   $HOME/lbann-ml/lbann-0.98.1/build

  CMAKE_INSTALL_PREFIX: $HOME/lbann-ml/lbann
  CMAKE_BUILD_TYPE:     Debug

  CXX FLAGS:             -fPIC -g -Wall -Wextra -Wno-unused-parameter -Wnon-virtual-dtor -Wshadow -g -O0

  LBANN_GNU_LINUX:        TRUE
  LBANN_HAS_HYDROGEN:     TRUE
  LBANN_HAS_OPENCV:       TRUE
  LBANN_HAS_CEREAL:       TRUE
  LBANN_HAS_CUDA:         TRUE
  LBANN_HAS_CUDNN:        TRUE
  LBANN_HAS_NCCL2:        FALSE
  LBANN_HAS_PROTOBUF:     TRUE
  LBANN_HAS_CNPY:         TRUE
  LBANN_HAS_TBINF:        TRUE
  LBANN_HAS_VTUNE:        FALSE
  LBANN_NVPROF:           FALSE
  LBANN_HAS_DOXYGEN:      FALSE
  LBANN_HAS_LBANN_PROTO:  TRUE
  LBANN_HAS_ALUMINUM:     TRUE
  LBANN_HAS_CONDUIT:      FALSE

== End LBANN Configuration Summary ==

-- Configuring done
-- Generating done
-- Build files have been written to:

Btw, there must be a better way to do the git repository check without getting a weird error message on the CMake call. The problem seems to be this call:

  execute_process(
    COMMAND ${__GIT_EXECUTABLE} rev-parse --is-inside-work-tree
    WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}"
    OUTPUT_VARIABLE __BUILDING_FROM_GIT_SOURCES
    OUTPUT_STRIP_TRAILING_WHITESPACE)

I think specifying the option ERROR_QUIET to execute_process should be enough.

mxmlnkn commented 5 years ago

So, I took a look at the command line for the build where I have this problem vs. the other system where I do not have that problem. The most promising difference seems to be that for some some reason, the non-working build uses libprotobuf.a while the working build uses libprotobuf.so. For the non-working ccmake shows Protobuf_INCLUDE_DIR=$HOME/lbann-ml/protobuf/include and Protobuf_LIBRARY_DEBUG=$HOME/lbann-ml/protobuf/lib64/libprotobuf.a, i.e., it uses my self-build protobuf statically which comes with a ProtobufConfig.cmake while the working build actually has Protobuf_INCLUDE_DIR=/usr/local/include and Protobuf_LIBRARY_DEBUG=/usr/local/lib/libprotobuf.so, i.e., it uses the system-installed package shared library.

Another difference is NCCL but I think I also had the same bug when I still had NCCL turned off. Even though the diff shows libcuda.so being non-present in the working build, both builds are compiled with CUDA turned on and both link libcudart_static.a; so, that's also a bit weird but that might be the difference between CMake 3.10.2 (for the non-working build) vs. CMake 3.14rc1 (for the working build).

The full diff:

diff lbann-libz-bug-{non,}working.log
1c1
< cd $HOME/lbann-0.98.1/build/src/data_store && /software/ml/CMake/3.10.2-GCCcore-6.4.0/bin/cmake\
---
> cd $HOME/lbann-0.98.1/build/src/data_store && /opt/cmake/bin/cmake\
4c4
< g++\
---
> /usr/bin/c++  \
12,13c12,13
<  -g\
<  -O0   CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o \
---
>  -O3\
>  -DNDEBUG   CMakeFiles/jag_converter-bin.dir/jag_converter.cpp.o \
15c15
<  -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:$HOME/Elemental/lib:$HOME/opencv/lib64:$HOME/lbann-0.98.1/build/external/TBinf:$HOME/cnpy/lib:$HOME/Aluminum/lib64 liblbann.so \
---
>  -Wl,-rpath,$HOME/lbann-0.98.1/build:$HOME/lbann-0.98.1/build/src/proto:/opt/Elemental/lib:/usr/local/cuda/lib64:/opt/opencv/lib:$HOME/lbann-0.98.1/build/external/TBinf:/usr/local/lib:/opt/cnpy/lib:/opt/Aluminum/lib:/usr/lib/x86_64-linux-gnu/openmpi/lib:/opt/hwloc/lib liblbann.so \
20,21d19
<  libcublas_device.a \
<  libcuda.so \
32c30
<  libprotobuf.a \
---
>  libprotobuf.so \
36c34,36
<  -lpthread libmpi_cxx.so \
---
>  -lpthread\
>  -L/usr/lib\
>  -pthread libmpi_cxx.so \
43a44
>  libnccl.so \

Ok, so I recompiled protobuf this time with adding the CMake option -Dprotobuf_BUILD_SHARED_LIBS=ON and rebuild LBANN. The problem still persists. Seems like that was not the problem.