Closed yangyangv8 closed 1 month ago
@yangyangv8 : Can you confirm that the test that you are running i.e. "./build/sendrecv_perf -b 8 -e 128M -f 2 -t 1 -g 2" runs fine if you rebuild rccl from source even if you don't add the printf in the kernel?
@mangupta I have confirmed that the program runs normally without adding printf in the kernel.
@mangupta hello, Is there any outcome to this issue now?
Hi @yangyangv8, created an internal ticket to investigate your issue. Thanks!
Hi @yangyangv8, sorry for the delayed response.
I am closing this issue since it is a duplicate of github.com/ROCm/ROCm/issues/3001 and is being addressed there. Also, note that this is an issue directed to the rccl repo so should ideally be created there.
Problem Description
Problem Description
In the rccl file prims_simple.h,I have added a section of printf in this kernel function, such as :
when i run rccl test, Use this command ./build/sendrecv_perf -b 8 -e 128M -f 2 -t 1 -g 2,will report this error:
After seeing the explanation here https://rocm.docs.amd.com/en/latest/about/CHANGELOG.html#non-hostcall-hip-printf, I have added the following settings in the RCCL CMakelists.txt file :
target_compile_options(rccl PRIVATE -mprintf-kind=buffered)
makefiles/common.mk: CXXFLAGS := -DCUDA_MAJOR=$(CUDA_MAJOR) -DCUDA_MINOR=$(CUDA_MINOR) -fPIC -fvisibility=hidden \ -Wall -mprintf-kind=buffered -g -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla \ -I $(CUDA_INC) \ $(CXXFLAGS)
After compiling RCCL, reported this error :
I have set these environment variables export HIP_KERNEL_PRINTF=1 export HIP_ENABLE_PRINTF=1 export HCC_ENABLE_PRINTF=1 export AMD_LOG_LEVEL=1
Using a Linux server with two GPU cards, Without printf, the program executes normally, How should I solve this problem?
Operating System
22.04.1 LTS (Jammy Jellyfish)
CPU
12th Gen Intel(R) Core(TM) i7-12700
GPU
AMD Radeon RX 7900 XTX
ROCm Version
ROCm 5.7.0
ROCm Component
HIP, HIPCC, rccl
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response