ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
119 stars 45 forks source link

runtime error: aqlprofile API table load failed #29

Closed cgleggett closed 4 years ago

cgleggett commented 4 years ago

After updating rocm from 3.3.0 to 3.5.1, rebuilding rocprofiler and roctracer, I get the following error when profiling an executable (which uses an AMD Vega 56 GPU):

> rocprof --stats -o rocpf_stat.csv the_prog
RPL: on '200626_131451' from '/opt/rocm-3.5.1/rocprofiler/rocprofiler' in '/home/leggett/work/fcs/bk_hip'
RPL: profiling '"runTFCSSimulation"'
RPL: input file ''
RPL: output dir '/tmp/rpl_data_200626_131451_50543'
RPL: result dir '/tmp/rpl_data_200626_131451_50543/input_results_200626_131451'
ROCProfiler: input from "/tmp/rpl_data_200626_131451_50543/input.xml"
  0 metrics
aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred.
( program exits )

I see a similar error when doing --hsa-trace

rocprof --hsa-trace -o rocpf_hsa.csv the_prog
RPL: on '200626_131810' from '/opt/rocm-3.5.1/rocprofiler/rocprofiler' in '/home/leggett/work/fcs/bk_hip'
RPL: profiling '"runTFCSSimulation"'
RPL: input file ''
RPL: output dir '/tmp/rpl_data_200626_131810_50607'
RPL: result dir '/tmp/rpl_data_200626_131810_50607/input_results_200626_131810'
ROCProfiler: input from "/tmp/rpl_data_200626_131810_50607/input.xml"
  0 metrics
ROCTracer (pid=50626): 
    HSA-trace()
    HSA-activity-trace()
aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred.
File 'rocpf_hsa.hsa_stats.csv' is generating

File 'rocpf_hsa.json' is generating

File 'rocpf_hsa.json' is generating

this is on a centos7 host.

eshcherb commented 4 years ago

Please set: $ export LD_LIBRARY_PATH=/opt/rocm/hsa-amd-aqlprofile/lib

It will be fixed in 3.6 release

cgleggett commented 4 years ago

there is no /opt/rocm/hsa-amd-aqlprofile/lib directory. Which package is supposed to install it?

eshcherb commented 4 years ago

The package is 'hsa-amd-aqlprofile'. Do you have /opt/rocm? - you might have /opt/rocm-\<rev>, something like: /opt/roccm-3.5.1 So then set path to /opt/rocm-3.5.1/hsa-amd-aqlprofile/lib

cgleggett commented 4 years ago

ah, ok. It was installed by hsa-amd-aqlprofile-1.0.0-1.x86_64, but got wiped out when I removed the 3.3 release of rocm to upgrade to 3.5.

cgleggett commented 4 years ago

ok, looks good now.

thanks!

eshcherb commented 4 years ago

Could you close the ticket?

cgleggett commented 4 years ago

BTW, I did need to turn on object-tracking, otherwise I got a core dump.

error(4096) "QueryKernelName(), Error: V3 code object detected - code objects tracking should be enabled
"
/opt/rocm-3.5.1/bin/rocprof: line 275: 32039 Aborted                 (core dumped) "runTFCSSimulation"

not the most elegant exit scenario, but at least the error message is clear ;-)