GPUOpen-Tools / radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCLâ„¢ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
MIT License
84 stars 19 forks source link

Out of resources when collecting performance counters #12

Open JGO95 opened 6 years ago

JGO95 commented 6 years ago

So I'm trying to collect performance counters on some hip kernels that are called by a python script. The GPU is an R9 Fury Nano. I've been using the terminal command rocm-profiler -o "counters.csv" --counterfile counters.txt -C -w . /usr/bin/python3 <python_app>

The file counters.txt only has a single counter name in the first line, SALUBusy.

However, when I try to run it, I always get the error ### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_OUT_OF_RESOURCES (0x1008) at file:mcwamp_hsa.cpp line:1185

I tried with other programs and it seems to be consistent. I assume that means the GPU doesn't have enough storage to store all the counter data? Or is it something else?

Does this mean I can't do it this way or is there a better way to go about collecting counter values in this case? I would appreciate if somebody could point me in the right direction. Thanks in advance.

chesik-amd commented 6 years ago

Can you see if you are able to profile the vector_copy sample application (found in /opt/rocm/hsa/sample)? (just want to rule out a generic issue preventing profiling on your system)

If so, can you also try running your program without the profiler but with the following two environment variables set: HSA_EMULATE_AQL=1 HSA_TOOLS_LIB=libhsa-runtime-tools64.so

Those two environment variables will put the HSA runtime in a similar state as is used when profiling. On some occasions we have had isues running applications in this runtime environment.

Thanks, Chris

JGO95 commented 6 years ago

Thank you for the help.

To answer your questions, I have confirmed that the profiler works with the vector-copy sample and some other applications.

I also ran the program after setting those environment variables, and it ran without problems. So the problem isn't related to those.

EDIT: Is there anything else I can provide that would help you figure out why this happened? Or any other variables I should try to debug? This was worked once before (only once though), but I don't why it did on that particular day and now I'm at a loss.

chesik-amd commented 5 years ago

If you get a chance, can you please try this with RCP v5.6 which was released last week, and let me know if this issues still reproduces.

Thanks