ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
MIT License
132 stars 49 forks source link

Do I need --parallel-kernels option for multi-kernel on multi-device scenario? #128

Closed mabdallah89 closed 2 months ago

mabdallah89 commented 1 year ago

Hello,

I am using concurrent kernel execution on multi-GPU system using multi-stream (see code example below). Example:

for(i = 0; i < GPU_N; i++)  
{
....
//Set device
hipSetDevice(i);
launch_addKernel(&A_d[start], &B_d[start], &C_d[start], size, 1, 1, streams[i]);
....
}

I want to collect some performance statistics while these kernels are running concurrently with rocprof. Do I still have to use --parallel-kernels option to ensure concurrent kernel execution?

command with --parallel-kernels

rocprof --parallel-kernels -i in-prof2.txt -o ./output.csv ./vector_add

I believe in this case the kernels will be running concurrently even without "--parallel-kernels" since each kernel run on a separate device. Am I correct? I believe --parallel-kernels is needed when concurrent kernels are running on a single device to avoid serialization. But, in case of, multi-device, it should support concurrent kernels by default since each kernel launched on a sperate device.

If the answer is NO, and I have to use "--parallel-kernels" in case of multi-device, then I have a runtime error when I run on multi-GPU MI210. The error is:

Memory access fault by GPU node-6 (Agent handle: 0xec22d0) on address 0x14ab000. Reason: Unknown.
/usr/bin/rocprof: line 297: 788934 Aborted                 (core dumped)

Any help, please?

ppanchad-amd commented 3 months ago

@mabdallah89 Apologies for the lack of response. Do you still need assistance with this issue? Thanks!

ppanchad-amd commented 2 months ago

@mabdallah89 No longer requires assistance on this ticket and agrees to close the ticket. Thanks!