cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

ASAN and UBSAN build failure in ROCm packages #40680

Open makortel opened 1 year ago

makortel commented 1 year ago

ASAN build is failing in the new ROCm packages with

>> Building shared library tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/src/HeterogeneousTestROCmDevice/libHeterogeneousTestROCmDevice.so
/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/rocm/5.4.2-4bcebc22a189c738df04e9c24dd2bf21/bin/hipcc -fgpu-rdc --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --target=x86_64-redhat-linux-gnu --gcc-toolchain=/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161 -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++1z -ftree-vectorize -Werror=array-bounds -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-c99-extensions -Wno-c++11-narrowing -D__STRICT_ANSI__ -Wno-unused-private-field -Wno-unknown-pragmas -Wno-unused-command-line-argument -Wno-unknown-warning-option -ftemplate-depth=512 -Wno-error=potentially-evaluated-expression -Wno-tautological-type-limit-compare -fsized-deallocation -DBOOST_DISABLE_ASSERTS -fno-omit-frame-pointer -fsanitize=address -fsanitize=pointer-subtract -shared -Wl,-E -Wl,-z,defs tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/src/HeterogeneousTestROCmDevice/DeviceAddition.hip.cc.o -o tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/src/HeterogeneousTestROCmDevice/libHeterogeneousTestROCmDevice.so -Wl,-E -Wl,--hash-style=gnu -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/eb44d66cea04a4da15371a279cb49a01/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_ASAN_X_2023-02-01-2300/biglib/el8_amd64_gcc11 -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/eb44d66cea04a4da15371a279cb49a01/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_ASAN_X_2023-02-01-2300/lib/el8_amd64_gcc11 -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/eb44d66cea04a4da15371a279cb49a01/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_ASAN_X_2023-02-01-2300/external/el8_amd64_gcc11/lib -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/eb44d66cea04a4da15371a279cb49a01/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_ASAN_X_2023-02-01-2300/static/el8_amd64_gcc11 -lamdhip64
clang-15: warning: ignoring '-fsanitize=pointer-subtract' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
clang-15: warning: ignoring '-fsanitize=address' option for offload arch 'gfx900' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]
clang-15: warning: ignoring '-fsanitize=address' option for offload arch 'gfx900' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]
clang-15: warning: ignoring '-fsanitize=address' option for offload arch 'gfx906' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]
clang-15: warning: ignoring '-fsanitize=address' option for offload arch 'gfx908' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]
ld.lld: error: undefined symbol: __asan_init
>>> referenced by DeviceAddition.hip.cc
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/src/HeterogeneousTestROCmDevice/DeviceAddition-f81954.o:(asan.module_ctor)

ld.lld: error: undefined symbol: __asan_version_mismatch_check_v8
>>> referenced by DeviceAddition.hip.cc
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/src/HeterogeneousTestROCmDevice/DeviceAddition-f81954.o:(asan.module_ctor)
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
  gmake: *** [tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/src/HeterogeneousTestROCmDevice/libHeterogeneousTestROCmDevice.so] Error 1
 Leaving library rule at HeterogeneousTest/ROCmDevice

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc11/CMSSW_13_0_ASAN_X_2023-02-01-2300/HeterogeneousTest/ROCmDevice

makortel commented 1 year ago

assign core,heterogeneous

cmsbuild commented 1 year ago

New categories assigned: heterogeneous,core

@fwyzard,@Dr15Jones,@smuzaffar,@makortel,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 year ago

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 year ago

I wonder if as a first measure we could disable ASAN on .dev.cc files? (or something)

makortel commented 1 year ago

UBSAN build fails similarly

clang-15: warning: ignoring '-fsanitize=undefined' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
clang-15: warning: ignoring '-fsanitize=builtin' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
clang-15: warning: ignoring '-fsanitize=pointer-overflow' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
ld.lld: error: undefined symbol: __ubsan_handle_builtin_unreachable
>>> referenced by hipCheck.h:46 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:46)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(HeterogeneousCoreROCmTestDevicePlugins::wrapper_add_vectors_f(float const*, float const*, float*, unsigned long))

ld.lld: error: undefined symbol: __ubsan_handle_load_invalid_value
>>> referenced by hipCheck.h:41 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:41)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(HeterogeneousCoreROCmTestDevicePlugins::wrapper_add_vectors_f(float const*, float const*, float*, unsigned long))
>>> referenced by hipCheck.h:44 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:44)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(HeterogeneousCoreROCmTestDevicePlugins::wrapper_add_vectors_f(float const*, float const*, float*, unsigned long))
>>> referenced by hipCheck.h:45 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:45)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(HeterogeneousCoreROCmTestDevicePlugins::wrapper_add_vectors_f(float const*, float const*, float*, unsigned long))
>>> referenced 8 more times

ld.lld: error: undefined symbol: __ubsan_vptr_type_cache
>>> referenced by hipCheck.h:28 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:28)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(cms::rocm::abortOnError(char const*, int, char const*, char const*, char const*, std::basic_string_view<char, std::char_traits<char>>))
>>> referenced by hipCheck.h:33 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:33)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(cms::rocm::abortOnError(char const*, int, char const*, char const*, char const*, std::basic_string_view<char, std::char_traits<char>>))
>>> referenced by ROCmTestDeviceAdditionModule.cc:35 (/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/60a62b3d24d7630b3c20d095a74392ed/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2023-02-01-2300/src/HeterogeneousTest/ROCmDevice/plugins/ROCmTestDeviceAdditionModule.cc:35)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(ROCmTestDeviceAdditionModule::ROCmTestDeviceAdditionModule(edm::ParameterSet const&))
>>> referenced 79 more times

ld.lld: error: undefined symbol: __ubsan_handle_dynamic_type_cache_miss
>>> referenced by hipCheck.h:28 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:28)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(cms::rocm::abortOnError(char const*, int, char const*, char const*, char const*, std::basic_string_view<char, std::char_traits<char>>))
>>> referenced by hipCheck.h:33 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:33)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(cms::rocm::abortOnError(char const*, int, char const*, char const*, char const*, std::basic_string_view<char, std::char_traits<char>>))
>>> referenced by ROCmTestDeviceAdditionModule.cc:35 (/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/60a62b3d24d7630b3c20d095a74392ed/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2023-02-01-2300/src/HeterogeneousTest/ROCmDevice/plugins/ROCmTestDeviceAdditionModule.cc:35)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(ROCmTestDeviceAdditionModule::ROCmTestDeviceAdditionModule(edm::ParameterSet const&))
>>> referenced 166 more times

ld.lld: error: undefined symbol: __ubsan_handle_type_mismatch_v1
>>> referenced by hipCheck.h:33 (src/HeterogeneousCore/ROCmUtilities/interface/hipCheck.h:33)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo-a15ff1.o:(cms::rocm::abortOnError(char const*, int, char const*, char const*, char const*, std::basic_string_view<char, std::char_traits<char>>))
>>> referenced by basic_string.h:920 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/basic_string.h:920)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>, std::random_access_iterator_tag) (.isra.0))
>>> referenced by basic_string.h:6235 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/basic_string.h:6235)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>, std::random_access_iterator_tag) (.isra.0))
>>> referenced 1345 more times

ld.lld: error: undefined symbol: __ubsan_handle_pointer_overflow
>>> referenced by stl_iterator.h:1037 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/stl_iterator.h:1037)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>, std::random_access_iterator_tag) (.isra.0))
>>> referenced by basic_string.h:920 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/basic_string.h:920)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>, std::random_access_iterator_tag) (.isra.0))
>>> referenced by basic_string.h:920 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/basic_string.h:920)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>>, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const>, std::random_access_iterator_tag) (.isra.0))
>>> referenced 330 more times

ld.lld: error: undefined symbol: __ubsan_handle_nonnull_arg
>>> referenced by typeinfo:124 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/typeinfo:124)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(std::_Sp_counted_ptr_inplace<edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>, std::allocator<edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>>, (__gnu_cxx::_Lock_policy)2>::_M_get_deleter(std::type_info const&))
>>> referenced by typeinfo:124 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/typeinfo:124)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(std::_Sp_counted_ptr_inplace<edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>, std::allocator<edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>>, (__gnu_cxx::_Lock_policy)2>::_M_get_deleter(std::type_info const&))
>>> referenced by char_traits.h:409 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/char_traits.h:409)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_construct<char*>(char*, char*, std::forward_iterator_tag) (.isra.0))
>>> referenced 26 more times

ld.lld: error: undefined symbol: __ubsan_handle_add_overflow
>>> referenced by atomicity.h:85 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/ext/atomicity.h:85)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>::~ModuleHolderT())
>>> referenced by atomicity.h:85 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/ext/atomicity.h:85)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>::~ModuleHolderT())
>>> referenced by atomicity.h:85 (/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/ext/atomicity.h:85)
>>>               tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionModule-ee3fc9.o:(edm::maker::ModuleHolderT<edm::global::EDAnalyzerBase>::~ModuleHolderT())
>>> referenced 30 more times
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
  gmake: *** [tmp/el8_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/libHeterogeneousTestROCmDevicePlugins.so] Error 1

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc11/CMSSW_13_0_UBSAN_X_2023-02-01-2300/HeterogeneousTest/ROCmDevice

smuzaffar commented 1 year ago

the compiler warnings can be fixed by dropping these for rocm builds. https://github.com/cms-sw/cmsdist/pull/8285 should allow us to drop flags for rocm. I will open a separate PR to actuall drop asan/ubsan flags which does not work/ignored

smuzaffar commented 1 year ago

I wonder if as a first measure we could disable ASAN on .dev.cc files? (or something)

I locally dropped asan flags for hip.cc but it still fails. I think problem is that for ASAN/UBSAN we LD_PRELOAD=gcc/lib/libasan.so which the cmssw rocm code is linked with hipcc ( which is clang based) os it might need libclang_rt.asan-x86_64.so instead of libasan

smuzaffar commented 1 year ago

https://github.com/google/sanitizers/issues/111 suggests

When linking shared libraries you should not force asan symbols into .so
 -- symbols like __asan_* should be left undefined so that at the dynamic-link time
they are taken from the main binary. 
My guess is that if you remove  -Wl,-z,defs from the command line, it will work. 

I tried it and it allowed to build rocm plugins/libs. I will open a PR to drop -Wl,-z,defs which linking using hipcc

smuzaffar commented 1 year ago

https://github.com/google/sanitizers/wiki/AddressSanitizer also explains it

Q: When I link my shared library with -fsanitize=address, it fails due to some undefined ASan symbols (e.g. asan_init_v4)?
A: Most probably you link with -Wl,-z,defs or -Wl,--no-undefined. These flags don't work with ASan unless you also use -shared-libasan (which is the default mode for GCC, but not for Clang).

humm, so may be -shared-libasan for hipcc link step might help

smuzaffar commented 1 year ago

hipcc uses llvm and asan flags link libclang_rt.asan-x86_64.so to our share lib/executable which the rest of cmssw uses gcc and it links libasan. I am afraid we can not have both libclang_rt.asan-x86_64.so and libasan in same process. Possible solutions are

  1. move ASAN builds to use llvm/clang
  2. disable ASAN/UBSAN for shared libs/executable which have hip.cc files

If no objections then I would like to try (1)

makortel commented 1 year ago

The LTO build shows

>> Building binary rocmComputeCapabilities
/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/rocm/5.4.2-4bcebc22a189c738df04e9c24dd2bf21/bin/hipcc -fgpu-rdc --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --target=x86_64-redhat-linux-gnu --gcc-toolchain=/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161 -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++1z -ftree-vectorize -Werror=array-bounds -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-c99-extensions -Wno-c++11-narrowing -D__STRICT_ANSI__ -Wno-unused-private-field -Wno-unknown-pragmas -Wno-unused-command-line-argument -Wno-unknown-warning-option -ftemplate-depth=512 -Wno-error=potentially-evaluated-expression -Wno-tautological-type-limit-compare -fsized-deallocation -DBOOST_DISABLE_ASSERTS -flto -fno-fat-lto-objects -Wodr -fPIC tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmUtilities/bin/rocmComputeCapabilities/isRocmDeviceSupported.hip.cc.o tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmUtilities/bin/rocmComputeCapabilities/rocmComputeCapabilities.cpp.o -Wl,-E -Wl,--hash-style=gnu -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/320574e495725a5c6e164ba805dfbd1e/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_LTO_X_2023-02-03-1100/biglib/el8_amd64_gcc11 -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/320574e495725a5c6e164ba805dfbd1e/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_LTO_X_2023-02-03-1100/lib/el8_amd64_gcc11 -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/320574e495725a5c6e164ba805dfbd1e/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_LTO_X_2023-02-03-1100/external/el8_amd64_gcc11/lib -L/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/320574e495725a5c6e164ba805dfbd1e/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_LTO_X_2023-02-03-1100/static/el8_amd64_gcc11 -lamdhip64 -o tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmUtilities/bin/rocmComputeCapabilities/rocmComputeCapabilities
ld.lld: error: undefined symbol: main
>>> referenced by /lib/../lib64/crt1.o:(_start)
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
>> Deleted: tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmUtilities/bin/rocmComputeCapabilities/rocmComputeCapabilities
  gmake: *** [tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmUtilities/bin/rocmComputeCapabilities/rocmComputeCapabilities] Error 1

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc11/CMSSW_13_0_LTO_X_2023-02-03-1100/HeterogeneousCore/ROCmUtilities

but this is limited to executables, so the IB still mostly works.

makortel commented 1 year ago

For some reason the LTO build doesn't contain ROCmTestDeviceAdditionModule module

----- Begin Fatal Exception 07-Feb-2023 16:03:23 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'ROCmTestDeviceAdditionModule' in category 'CMS EDM Framework Module'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc11/CMSSW_13_0_LTO_X_2023-02-07-1100/unitTestLogs/HeterogeneousTest/ROCmDevice#/

smuzaffar commented 1 year ago

so looks like the missing plugin in LTO is not fixed. I see there is no build errors https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc11/CMSSW_13_0_LTO_X_2023-02-07-1100/HeterogeneousTest/ROCmDevice but somehow the plugin is not properly register in the .edmplugincache file. I am looking in to it

smuzaffar commented 1 year ago

https://github.com/cms-sw/cmssw-config/pull/92/commits/d32ff13ee0429e6fcc0af6b48eb0c8dc880b23bf should fix the ROCm LTO plugin issue .

fwyzard commented 1 year ago

ah !