Kernel execution serialization

GPUOpen-Tools / radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.

MIT License

85 stars 19 forks source link

Kernel execution serialization #11

Open yupinov opened 6 years ago

yupinov commented 6 years ago

Is there an option for making all the kernels execute sequentially (especially when work is launched in multiple queues)? Coming from CUDA and nvprof, I was surprised to not find such a feature for the better kernel performance understanding.

chesik-amd commented 6 years ago

When collecting performance counters, the profiler will introduce serialization to try to ensure that only one kernel is executing at a time. There is no option for this, as it is the default behavior.

pszi1ard commented 5 years ago

What about measuring performance in real-life environment under concurrent execution?

Additionally this seems to imply that traces in CodeXL can't be used to analyze kernel overlap?

chesik-amd commented 5 years ago

Serialization is only done when collecting performance counters (which is the mode you would use to analyze performance of individual kernels). No additional serialization is introduced when collecting a trace (which is the mode you would use to analyze an entire application (including kernel overlap)).

pszi1ard commented 5 years ago

I see. I'd suggest allowing serialization to be turned on/off.

Is there a way to measure wall-time only without serialization?