I want to collect some performance statistics while these kernels are running concurrently with rocprof. Do I still have to use --parallel-kernels option to ensure concurrent kernel execution?
I believe in this case the kernels will be running concurrently even without "--parallel-kernels" since each kernel run on a separate device. Am I correct? I believe --parallel-kernels is needed when concurrent kernels are running on a single device to avoid serialization. But, in case of, multi-device, it should support concurrent kernels by default since each kernel launched on a sperate device.
If the answer is NO, and I have to use "--parallel-kernels" in case of multi-device, then I have a runtime error when I run on multi-GPU MI210. The error is:
Memory access fault by GPU node-6 (Agent handle: 0xec22d0) on address 0x14ab000. Reason: Unknown.
/usr/bin/rocprof: line 297: 788934 Aborted (core dumped)
Hello,
I am using concurrent kernel execution on multi-GPU system using multi-stream (see code example below). Example:
I want to collect some performance statistics while these kernels are running concurrently with rocprof. Do I still have to use --parallel-kernels option to ensure concurrent kernel execution?
command with --parallel-kernels
I believe in this case the kernels will be running concurrently even without "--parallel-kernels" since each kernel run on a separate device. Am I correct? I believe --parallel-kernels is needed when concurrent kernels are running on a single device to avoid serialization. But, in case of, multi-device, it should support concurrent kernels by default since each kernel launched on a sperate device.
If the answer is NO, and I have to use "--parallel-kernels" in case of multi-device, then I have a runtime error when I run on multi-GPU MI210. The error is:
Any help, please?