Open ax3l opened 2 years ago
Most likely issue: https://github.com/ROCmSoftwarePlatform/rocRAND/pull/29#issuecomment-912815457
$ ldd /opt/rocm-4.3.0/rocrand/lib/librocrand.so.1.1.40300
...
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0650e4e000)
...
Actually, it looks like we miss it, since I cannot find another libpthread dependency unresolved in rocrand
.
ld.lld: error: undefined symbol: pthread_create
>>> referenced by AMReX_BackgroundThread.cpp
>>> AMReX_BackgroundThread.cpp.o:(amrex::BackgroundThread::BackgroundThread()) in archive ../../_deps/amrex-build/Src/libamrex.a
Interesting, since we search and link pthreads: https://github.com/AMReX-Codes/amrex/blob/168a690497396de4c6b89a36b6edb0430e51ef4c/Tools/CMake/AMReXParallelBackends.cmake#L1-L8
The CMake output from this setup:
-- The C compiler identification is Clang 12.0.0
-- The CXX compiler identification is Clang 13.0.0
...
-- Check for working C compiler: /opt/cray/pe/craype/2.7.8/bin/cc - skipped
...
-- Check for working CXX compiler: /opt/rocm-4.3.0/llvm/bin/clang++ - skipped
is concerning. Looks like the Cray and the AMD Clang are mixed.
One should add
-DCMAKE_C_COMPILER=/opt/rocm-4.3.0/llvm/bin/clang
too for consistency.
Yes, we should do that. That seems to fix the pthread issue.
Let's ignore the errors in compiling tutorials that use AmrLeve. If I run amrex-tutorials/build/3d.gnu.float.hip/Basic/HelloWorld_C/Basic_HelloWorld_C
, I get
Initializing HIP...
HIP initialized.
"Cannot find Symbol"
SIGABRT
See Backtrace.0 file for details
So now we have reproduced the symbol issue reported to us.
Compiling now with
cmake -S . -B build/3d.gnu.float.hip -DAMReX_FORTRAN=OFF -DAMReX_GPU_BACKEND=HIP -DAMReX_AMD_ARCH=gfx908 -DAMReX_OMP=OFF -DAMReX_MPI=OFF -DAMReX_PRECISION=SINGLE -DAMReX_SPACEDIM=3 -DCMAKE_CXX_COMPILER=/opt/rocm-4.3.0/llvm/bin/clang++ -DCMAKE_CXX_STANDARD=17 -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm-4.3.0/llvm/bin/clang
cmake --build build/3d.gnu.float.hip -j 12
to reproduce
With cmake 3.20.2 we can use hipcc
as CXX Compiler:
cmake -S . -B build/3d.gnu.float.hip -DAMReX_FORTRAN=OFF -DAMReX_GPU_BACKEND=HIP -DAMReX_AMD_ARCH=gfx908 -DAMReX_OMP=OFF -DAMReX_MPI=OFF -DAMReX_PRECISION=SINGLE -DAMReX_SPACEDIM=3 -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_CXX_STANDARD=17 -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm-4.3.0/llvm/bin/clang
So just some llvm magic flags from hipcc
missing.
Same thing with cmake/3.21.2-dev
unravels the hipcc
to clang++
.
Now we have to work around that already fixed upstream bug about defaults in -x cxx
and -x hip
front-ends: (ref)
export CXXFLAGS="-std=c++17"
cmake -S . -B build/3d.gnu.float.hip -DAMReX_FORTRAN=OFF -DAMReX_GPU_BACKEND=HIP -DAMReX_AMD_ARCH=gfx908 -DAMReX_OMP=OFF -DAMReX_MPI=OFF -DAMReX_PRECISION=SINGLE -DAMReX_SPACEDIM=3 -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_CXX_STANDARD=17 -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm-4.3.0/llvm/bin/clang
That then still raises "Cannot find Symbol"
though, so some llvm flags still being lost somewhere, maybe because ROCm 4.3.0 does not yet anticipate CMake 3.21-dev and thus the hip::device
misses some flags or so.
User should for now not use a dev version of CMake on Spock, but just the latest stable release.
For the "Cannot find Symbol" issue, one can strace
the application like this (Crusher example):
export proj=aphXYZ # change this to your OLCF project
alias runNode="srun -A $proj -J warpx -t 00:30:00 -p batch -N 1 -c 8 --ntasks-per-node=8"
cd build/bin
runNode strace ./warpx ../../Examples/Physics_applications/laser_acceleration/inputs_3d 2>&1 | grep -E '^open(at)?\(.*\.so'
Note latest Crusher instructions in WarpX: https://warpx.readthedocs.io/en/latest/install/hpc/crusher.html
I am getting the "Cannot find Symbol" issue on NCSA Delta's MI100 node. Unfortunately, it doesn't have the Cray compilers installed, so I can't follow the WarpX build instructions. Is there another workaround?
gnu make
gnu make
Weirdly, although it complains, it also works if it set CMAKE_CXX_COMPILER
to hipcc
. Is this a CMake bug?
I don't know. GNU make uses the hipcc wrapper instead of AMD's clang.
FYI- the hipcc/amdclang++ issue has been passed along to AMD's ROCm dev team.
I ran into this as well (ORNL crusher this time). I used Cray's CC wrapper and cmake. Is there a solution other than using hipcc
or GNU make? I am building for Cactus/CarpetX which itself is a complex build system so, given that it took me a couple days getting things to work with CC, I am hoping to not have to redo everything for hipcc ;-)
from @WeiqunZhang via
<unknown user>
report on Spock (OLCF).On a login node:
results in