GPUOpen-Tools / radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCLâ„¢ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
MIT License
84 stars 19 forks source link

Kernel and data transfer events not recorded in OpenCL applications #10

Closed syifan closed 5 years ago

syifan commented 6 years ago

I was using the earlier version of RCP that is shipped with ROCM 1.6.Back to that time, the profiling result contains the kernel launching information. If I import the atp file into CodeXL, the timeline looks like this.

image

Recently, when I try the same commands again, I can only get a timeline like this.

image

The kernel launching information are not recorded anymore. Am I making some mistakes when I run the profiler? The command that I used is

rocm-profiler -w . -A --hsaaqlpackettrace [my_application and its arguments]
syifan commented 6 years ago

I think I made a mistake in the earlier post. If I profile an HC++ or HIP program, I can still get the kernel information. However, if I profile an OpenCL benchmark, I will not get the kernel information. It seems the main reason is that the API trace terminates early and there is some other information missing from the trace. Maybe it is because hsa_shutdown is not called? I wonder if there is a way to properly record the whole trace in an OpenCL benchmark?

syifan commented 6 years ago

OK, I solve the problem partially by myself. I added a hsa_shutdown call at the end of the program. However, I do not think this is the legitimate solution, as the whole program is purely an OpenCL program.It there a way to avoid that?

chesik-amd commented 6 years ago

In current RCP builds, we don't have a relaible workaround for applications that don't call hsa_shut_down (like all OpenCL applications).

For trace profiling of an OpenCL application, you may be able to get better results if you try the latest release of RCP (the 5.3 release from here https://github.com/GPUOpen-Tools/RCP/releases)

With this release, you can try using OpenCL tracing as opposed to HSA tracing. Simply replace the "-A --hsaaqlpackettrace" with "--apitrace".

We are planning on officially supported OpenCL-on-ROCm profiling in a future RCP release, but for now, you can try the above to see if it gives better results for an OpenCL application that doesn't explicitly call hsa_shut_down.

Collecting perf counters using the OpenCL proflier (i.e. using --perfcounter) still won't work for OCL-on-ROCm applications, but this something that should work better in future releases.

chesik-amd commented 5 years ago

This should work better in recent RCP releases, as well as in the profiler available with ROCM 2.0 You should now be able to profile OpenCL applications using --perfcounter when running on the ROCm stack.