Open adanalis opened 2 months ago
I suspect A/B may be related. Can you post the code where you call rocprofiler_configure_agent_profile_counting_service?
C has an internal patch that resolves this issue that should be published shortly. D has a patch in the works that should be available soon.
In addition to the problems discussed above, I'm now getting a segfault inside rocprof-sdk code. I created a PR in the PAPI repo that enables the agent profiling mode and comes with tests. The PR is here: https://github.com/icl-utk-edu/papi/pull/249
To reproduce the segfault please do the following:
1) clone PAPI, go into the directory "$papi_root/src" and run ./configure --with-components=rocp_sdk
2) run make
3) export RPSDK_MODE_AGENT_PROFILE=1
4) go to $papi_root/src/components/rocp_sdk/tests
5) run ./advanced
Here is the backtrace from my runs:
from /apps/rocm/rocm-6.3afar6/lib/llvm/bin/../../../lib/libhsa-runtime64.so.1
at papi_internal.c:1713
Problem Description
A) I only get non-zero values for the first event that I have added to the profile.
B) I start two agents for two distinct GPUs, I submit my kernel on only one GPU, but I get the same measurements from both agents.
C) When I get the measurements I have no way of distinguishing which measurement came from which agent.
D) When using watermark equal to zero, the buffer callback is triggered as soon as there is one entry in the buffer, but before all the entries have been in the buffer. As a result we see the entries "out of order." We would like the data to be accessible synchronously when we get a sample without having to go through buffers.
Operating System
Rocky Linux 9.4 (Blue Onyx)
CPU
AMD EPYC 7413 24-Core Processor
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.2.0
ROCm Component
rocprofiler
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response