Open yangyangv8 opened 2 months ago
@yangyangv8 : Can you confirm that the test that you are running i.e. "./build/sendrecv_perf -b 8 -e 128M -f 2 -t 1 -g 2" runs fine if you rebuild rccl from source even if you don't add the printf in the kernel?
@mangupta I have confirmed that the program runs normally without adding printf in the kernel.
@mangupta hello, Is there any outcome to this issue now?
Problem Description
Problem Description
In the rccl file prims_simple.h,I have added a section of printf in this kernel function, such as :
when i run rccl test, Use this command ./build/sendrecv_perf -b 8 -e 128M -f 2 -t 1 -g 2,will report this error:
After seeing the explanation here https://rocm.docs.amd.com/en/latest/about/CHANGELOG.html#non-hostcall-hip-printf, I have added the following settings in the RCCL CMakelists.txt file :
target_compile_options(rccl PRIVATE -mprintf-kind=buffered)
makefiles/common.mk: CXXFLAGS := -DCUDA_MAJOR=$(CUDA_MAJOR) -DCUDA_MINOR=$(CUDA_MINOR) -fPIC -fvisibility=hidden \ -Wall -mprintf-kind=buffered -g -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla \ -I $(CUDA_INC) \ $(CXXFLAGS)
After compiling RCCL, reported this error :
I have set these environment variables export HIP_KERNEL_PRINTF=1 export HIP_ENABLE_PRINTF=1 export HCC_ENABLE_PRINTF=1 export AMD_LOG_LEVEL=1
Using a Linux server with two GPU cards, Without printf, the program executes normally, How should I solve this problem?
Operating System
22.04.1 LTS (Jammy Jellyfish)
CPU
12th Gen Intel(R) Core(TM) i7-12700
GPU
AMD Radeon RX 7900 XTX
ROCm Version
ROCm 5.7.0
ROCm Component
HIP, HIPCC, rccl
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response