ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.07k stars 224 forks source link

MIOpen unit test link issue : ld.lld: error: undefined symbol: dladdr and undefined reference due to --no-allow-shlib-undefined #3005

Closed junliume closed 4 months ago

junliume commented 5 months ago

@atamazov could you help to check if this issue relates to MIOpen change or compiler change? Thanks!

[2024-05-30T06:51:49.711Z] [ 75%] Linking CXX executable ../../bin/test_miopendriver_conv2d_trans
[2024-05-30T06:51:49.711Z] cd /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/build_hip/test/gtest && /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/test_miopendriver_conv2d_trans.dir/link.txt --verbose=1
[2024-05-30T06:51:49.712Z] /opt/rocm-6.2.0-501/llvm/bin/clang++ -O3 -DNDEBUG -s -Wl,--enable-new-dtags,--build-id=sha1,--rpath,$ORIGIN/../lib -pthread CMakeFiles/test_miopendriver_conv2d_trans.dir/miopendriver_conv2d_trans.cpp.o CMakeFiles/test_miopendriver_conv2d_trans.dir/log.cpp.o CMakeFiles/test_miopendriver_conv2d_trans.dir/platform.cpp.o CMakeFiles/test_miopendriver_conv2d_trans.dir/conv_common.cpp.o -o ../../bin/test_miopendriver_conv2d_trans  -Wl,-rpath,/long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/build_hip/lib:/opt/rocm-6.2.0-501/lib:/long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/lib /usr/local/lib/libgtest.a /usr/local/lib/libgtest_main.a ../../lib/libMIOpen.so.1.0.60200 /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/lib/libboost_filesystem.a /opt/rocm-6.2.0-501/lib/librocblas.so.4.2.60200 /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/lib/libbz2.a -lstdc++fs /usr/local/lib/libgtest.a -lpthread /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/lib/libboost_atomic.a /opt/rocm-6.2.0-501/lib/libamdhip64.so.6.2.60200 /opt/rocm-6.2.0-501/lib/llvm/lib/clang/18/lib/linux/libclang_rt.builtins-x86_64.a --hip-link /opt/rocm-6.2.0-501/lib/libamd_comgr.so.2.8.60200 /usr/lib/x86_64-linux-gnu/librt.so -Wl,-rpath-link,/long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/lib 
[2024-05-30T06:51:49.812Z] ld.lld: error: undefined symbol: dladdr
[2024-05-30T06:51:49.812Z] >>> referenced by miopendriver_conv2d_trans.cpp
[2024-05-30T06:51:49.812Z] >>>               CMakeFiles/test_miopendriver_conv2d_trans.dir/miopendriver_conv2d_trans.cpp.o:(miopendriver_conv2d_trans::RunMIOpenDriver())
[2024-05-30T06:51:49.812Z] clang++: error: linker command failed with exit code 1 (use -v to see invocation)
[2024-05-30T06:51:49.812Z] make[3]: *** [test/gtest/CMakeFiles/test_miopendriver_conv2d_trans.dir/build.make:160: bin/test_miopendriver_conv2d_trans] Error 1
[2024-05-30T06:51:49.812Z] make[3]: Leaving directory '/long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/build_hip'
[2024-05-30T06:51:49.812Z] make[2]: *** [CMakeFiles/Makefile2:20423: test/gtest/CMakeFiles/test_miopendriver_conv2d_trans.dir/all] Error 2
[2024-05-30T06:51:49.812Z] make[2]: *** Waiting for unfinished jobs....

and another one:

[100%] Linking CXX executable ../../bin/test_miopendriver_conv2d_trans
cd /data/driver/MIOpen/build/test/gtest && /usr/local/cmake-3.27.3-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/test_miopendriver_conv2d_trans.dir/link.txt --verbose=1
/opt/rocm/bin/amdclang++ -O3 -DNDEBUG -s -pthread CMakeFiles/test_miopendriver_conv2d_trans.dir/miopendriver_conv2d_trans.cpp.o CMakeFiles/test_miopendriver_conv2d_trans.dir/log.cpp.o CMakeFiles/test_miopendriver_conv2d_trans.dir/platform.cpp.o CMakeFiles/test_miopendriver_conv2d_trans.dir/conv_common.cpp.o -o ../../bin/test_miopendriver_conv2d_trans  -Wl,-rpath,/data/driver/MIOpen/install_dir_2004/lib:/opt/conda/lib:/data/driver/MIOpen/build/lib:/opt/rocm-6.1.0/lib:/opt/rocm/lib -ldl /opt/conda/lib/libgtest_main.so.1.11.0 ../../lib/libMIOpen.so.1.0 /data/driver/MIOpen/install_dir_2004/lib/libboost_filesystem.a /opt/rocm-6.1.0/lib/librocblas.so.4.3 /usr/lib/x86_64-linux-gnu/libbz2.so -lstdc++fs /opt/conda/lib/libgtest.so.1.11.0 -lpthread /data/driver/MIOpen/install_dir_2004/lib/libboost_atomic.a /opt/rocm/lib/libamdhip64.so.6.1.60100 /opt/rocm-6.1.0/lib/llvm/lib/clang/17/lib/linux/libclang_rt.builtins-x86_64.a --hip-link --offload-arch=gfx942 /opt/rocm/lib/libamd_comgr.so.2.7.60100 /usr/lib/x86_64-linux-gnu/librt.so -Wl,-rpath-link,/data/driver/MIOpen/install_dir_2004/lib
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: std::__throw_bad_array_new_length()@GLIBCXX_3.4.29
>>> referenced by /opt/conda/lib/libgtest.so.1.11.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
junliume commented 5 months ago

Likely related to #2984

Related issue: https://github.com/ROCm/MIOpen/issues/2178

junliume commented 5 months ago

@BrianHarrisonAMD @JehandadKhan @atamazov @pfultz2 : some more clarifications behind this issue:

Some changes in compiler have exposed issues with Ubuntu 20.04 (these issues are not reproducible on Ubuntu 22.04)

The first issue is a known one (ref #2177): https://github.com/ROCm/MIOpen/blob/5f136335e2662a212ac4965ffd8c1817705d6249/src/CMakeLists.txt#L787-L789 so that the workaround is to do the same for gTest which include MIOpenDriver now.

The second issue is likely caused by compiler changes recently enabled --no-allow-shlib-undefined by default:

Before #3007:

amberhassaan commented 4 months ago

@junliume : need clarity on the sources of these errors. Does MIOpen need libdl? Do we use the dl api in MIOpen? Secondly: what conda environment are we talking of here?

amberhassaan commented 4 months ago

OK, I found the answer to the first one. libdl is being used in MIOpen and we need to link against that. So now we need to figure out the root cause of 2nd one.

junliume commented 4 months ago

OK, I found the answer to the first one. libdl is being used in MIOpen and we need to link against that. So now we need to figure out the root cause of 2nd one.

OK, I found the answer to the first one. libdl is being used in MIOpen and we need to link against that. So now we need to figure out the root cause of 2nd one.

@amberhassaan The cause of the second one is polluted environment, where under /opt/conda there is another version of googtest, which requires GLIBCXX_3.4.29 (hence not offered in Ubuntu 20.04). So now we explicitly request a specific version of googletest to be installed as part of the dependency.