ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

tblextr.py : bad kfd record / IndexError #42

Open yoann-heitz opened 3 years ago

yoann-heitz commented 3 years ago

When running the following command

./rocprofiler/bin/rocprof --hsa-trace --hip-trace --kfd-trace -d traces python3 ./test.py

I sometimes get one of the following errors

  File "/home/yoann/rocprofiler/bin/tblextr.py", line 710, in <module>
    hip_trace_found = fill_api_db('HIP', db, indir, 'hip', HIP_PID, OPS_PID, [], {}, 1)
  File "/home/yoann/rocprofiler/bin/tblextr.py", line 441, in fill_api_db
    copy_data = list(copy_raws[copy_index])
IndexError: list index out of range
Profiling data corrupted: ' traces/rpl_data_210319_075021_860663/input_results_210319_075021/results.txt'

or

scan kfd API data 2803664:2803665                                                                                                    /home/yoann/rocprofiler/bin/tblextr.py: kfd bad record: ''
Profiling data corrupted: ' traces/rpl_data_210319_074742_860200/input_results_210319_074742/results.txt'

I use rocm-4.0.0 and I cloned and built rocprofiler and roctracer from the github repositories. I used both amd-master and rocm-4.0.x branches. The IndexError occurred in both cases. The kfd bad record only occurred with the rocm-4.0.0 branch (but the errors didn't occur at each run so maybe it could also occur with the amd-master branch.

index_error.txt bad_record_error.txt test.py.zip

arfio commented 3 years ago

I get the same error using the latest 4.1 ROCm release.

eshcherb commented 3 years ago

Could you check with '--sys-trace' which should enable HIP + HSA tracing.

arfio commented 3 years ago

Tracing with '--sys-trace' alone works correctly.

arfio commented 2 years ago

I am not able to reproduce this issue with the latest ROCm (5.0.0).