Open abagusetty opened 4 days ago
Hi @abagusetty. Internal ticket is created to investigate your issue. Thanks!
Hi @abagusetty, unfortunately I can't repro this issue with ROCm 6.2.4 and cuda_12.0.r12.0/compiler.32267302_0 as well as cuda_12.6.r12.6/compiler.34841621_0. Can you upgrade to the latest cuda 12.6?
Hi @zichguan-amd I just tried again with ROCm 6.2.4 and CUDA 12.6.1 (Build cuda_12.6.r12.6/compiler.34714021_0) version and here are my steps that I missed in the above chain:
cmake ../catch -DHIP_COMPILER=nvcc -DHIP_PLATFORM=nvidia -DHIP_RUNTIME=cuda -DHIP_PATH=/soft/compilers/rocm/6.2.4/clr-install
Attached is the cmake configure and build log files cmake.log build.log
I see you are on Cray machines, and I suspect is an environment/setup issue. Can you build the failing test directly using hipcc
with -v
and HIPCC_VERBOSE=1
and check at which stage it fails?
cd /home/abagusetty/rocm/hip-tests/build/catch_tests/unit/graph && HIPCC_VERBOSE=1 /soft/compilers/rocm/6.2.4/clr-install//bin/hipcc -v -DKERNELS_PATH=\"/home/abagusetty/rocm/hip-tests/catch/kernels/\" -I/home/abagusetty/rocm/hip-tests/catch/external/Catch2 -I/home/abagusetty/rocm/hip-tests/catch/./include -I/home/abagusetty/rocm/hip-tests/catch/./kernels -I/soft/compilers/rocm/6.2.4/clr-install/include -I/home/abagusetty/rocm/hip-tests/catch/external/picojson --std=c++17 --extended-lambda -MD -MT catch_tests/unit/graph/CMakeFiles/GraphsTest1.dir/hipGraphAddMemcpyNode.cc.o -MF CMakeFiles/GraphsTest1.dir/hipGraphAddMemcpyNode.cc.o.d -o CMakeFiles/GraphsTest1.dir/hipGraphAddMemcpyNode.cc.o -c /home/abagusetty/rocm/hip-tests/catch/unit/graph/hipGraphAddMemcpyNode.cc
It should just invoke nvcc
like
/usr/local/cuda/bin/nvcc -Wno-deprecated-gpu-targets -isystem /usr/local/cuda/include -isystem "/opt/rocm-6.2.4/include" -x cu -v -DKERNELS_PATH=\"/home/rocm/hip-tests/catch/kernels/\" -I/home/rocm/hip-tests/catch/external/Catch2 -I/home/rocm/hip-tests/catch/./include -I/home/rocm/hip-tests/catch/./kernels -I/opt/rocm/include -I/home/rocm/hip-tests/catch/external/picojson --std=c++17 --extended-lambda -MD -MT catch_tests/unit/graph/CMakeFiles/GraphsTest1.dir/hipGraphAddMemcpyNode.cc.o -MF CMakeFiles/GraphsTest1.dir/hipGraphAddMemcpyNode.cc.o.d -o "CMakeFiles/GraphsTest1.dir/hipGraphAddMemcpyNode.cc.o" -c /home/rocm/hip-tests/catch/unit/graph/hipGraphAddMemcpyNode.cc
try if you can compile directly using nvcc
Problem Description
Build issues with Nvidia platform
Tried: CUDA 12.6.0 & 12.6.1 rocm: 6.2.4 (couldnt choose this version in the drop downs)
Operating System
SLES 15-SP5
CPU
AMD EPYC 7543 32-Core Processor
GPU
Nvidia A100
ROCm Version
ROCm 6.2.3
ROCm Component
HIP
Steps to Reproduce
Compile verbose output:
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
Output of
hipconfig --full