I noticed that profiling applications in ROCm 5.2.x causes issues. A peak at verbose debug logs shows
that we crash when checking for arch_vgpr and accum_vgpr (two counters added in ROCm 5.3).
ROCPRofiler: 167 contexts collected, output directory
/tmp/rpl_data_230608_132620_2929822/input_results_230608_132620
File '/home/colramos/GitHub/omniperf-pub/workloads/mix_all/mi200/timestamps.csv' is generating
Successfully joined gpu in pmc_perf.csv
Successfully joined grd in pmc_perf.csv
Successfully joined wgr in pmc_perf.csv
Successfully joined lds in pmc_perf.csv
Successfully joined scr in pmc_perf.csv
Traceback (most recent call last):
File "./src/omniperf", line 917, in <module>
main()
File "./src/omniperf", line 812, in main
omniperf_profile(args, VER)
File "./src/omniperf", line 698, in omniperf_profile
join_prof(workload_dir, args.join_type, log, args.verbose)
File "/home/colramos/GitHub/omniperf-pub/src/utils/perfagg.py", line 136, in join_prof
if not test_df_column_equality(_df):
File "/home/colramos/GitHub/omniperf-pub/src/utils/perfagg.py", line 92, in test_df_column_equality
return df.eq(df.iloc[:, 0], axis=0).all(1).all()
File "/home/colramos/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 961, in __getitem__
return self._getitem_tuple(key)
File "/home/colramos/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1458, in _getitem_tuple
tup = self._validate_tuple_indexer(tup)
File "/home/colramos/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 769, in _validate_tuple_indexer
self._validate_key(k, i)
File "/home/colramos/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1361, in _validate_key
self._validate_integer(key, axis)
File "/home/colramos/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1452, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
I noticed that profiling applications in ROCm 5.2.x causes issues. A peak at verbose debug logs shows that we crash when checking for
arch_vgpr
andaccum_vgpr
(two counters added in ROCm 5.3).It's not the concern I expressed in the original ticket (https://github.com/AMDResearch/omniperf/issues/117#issuecomment-1548683699), but it'll be an easy fix
https://github.com/AMDResearch/omniperf/blob/a346db7646b0a935f4cac51d131b4a585f065c05/src/utils/perfagg.py#L123-L133