andikleen / pmu-tools

Intel PMU profiling tools
GNU General Public License v2.0
1.97k stars 330 forks source link

toplev reports zero info metrics #466

Closed aayasin closed 11 months ago

aayasin commented 1 year ago

Latest toplev (with TMA 4.6 support) apparently broken Info metrics.

In this reproducer it reports no Time/Instructions for BC2s on ICX:

$ ./pmu-tools/toplev.py --quiet -vl1 --nodes +Time,+Instructions ./pmu-tools/workloads/BC2s
# 4.6-full-perf on Genuine Intel(R) CPU $0000%@ [icx/icelake]
FE             Frontend_Bound   % Slots                      17.3
BAD            Bad_Speculation  % Slots                      28.3   <==
BE             Backend_Bound    % Slots                      13.2  <
RET            Retiring         % Slots                      41.2  <
Info.Inst_Mix  Instructions       Count                       0
Info.System    Time               Seconds                     0.00

This was working well on previous 4.5 version:

$ ../perf-tools.git/pmu-tools/toplev.py --quiet -vl1 --nodes +Time,+Instructions ./pmu-tools/workloads/BC2s
# 4.5-full-perf on Genuine Intel(R) CPU $0000%@ [icx/icelake]
FE             Frontend_Bound   % Slots                      16.9
BAD            Bad_Speculation  % Slots                      28.7   <==
BE             Backend_Bound    % Slots                      13.2  <
RET            Retiring         % Slots                      41.2  <
Info.Inst_Mix  Instructions       Count          24,262,383,124
Info.System    Time               Seconds                     3.74
andikleen commented 1 year ago

The test case even fails completely for me. I will take a look.

Traceback (most recent call last): File "/home/ak/pmu/pmu-tools/./toplev.py", line 4282, in ret, count = measure_and_sample(runner_list, 0 if args.drilldown else None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ak/pmu/pmu-tools/./toplev.py", line 4210, in measure_and_sample ret = execute(runner_list, out, rrest, summary) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ak/pmu/pmu-tools/./toplev.py", line 2000, in execute for ret, res, rev, interval, valstats, env in do_execute( File "/home/ak/pmu/pmu-tools/./toplev.py", line 2227, in do_execute runner, skip, event = check_event(rlist, event, len(res[title]), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ak/pmu/pmu-tools/./toplev.py", line 2096, in check_event expected_ev = remove_qual(revnum[off])

andikleen commented 12 months ago

The failure I'm seeing is caused by newer perf printing lots of copies of duration_time

aayasin commented 12 months ago

the failure I am seeing with perf version 5.15.111 Can be easily built with perf-tools/build-perf.sh

aayasin commented 12 months ago

@andikleen did you checked perf version 5.15.111?

andikleen commented 12 months ago

Still works here:

% perf515 --version perf version 5.15.g8bb7eca972ad % PERF=perf515 ./toplev --quiet -vl1 --nodes +Time,+Instructions ./workloads/BC1s

4.6-full-perf on Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz [skx/skylake]

C18 FE Frontend_Bound % Slots 17.4 C18 BAD Bad_Speculation % Slots 28.6 <== C18 BE Backend_Bound % Slots 6.6 < C18 RET Retiring % Slots 47.4 < C18-T0 Info.Inst_Mix Instructions Count 0 C18-T0 Info.System Time Seconds 0.00 C18-T1 Info.Inst_Mix Instructions Count 0 C18-T1 Info.System Time Seconds 0.00

aayasin commented 12 months ago

I don't know. How can we explain that toplev prior the 4.6 TMA push, all works well with perf version 5.15.111 . The reported issue applies only after that push (when the same perf tool), as I illustrated in the reproducer at this start of this issue.

andikleen commented 11 months ago

On Sat, Aug 05, 2023 at 05:23:12AM -0700, Ahmad Yasin wrote:

I don't know. How can we explain that toplev prior the 4.6 TMA push, all works well with perf version 5.15.111 . The reported issue applies only after that push (when the same perf tool), as I illustrated in the reproducer at this start of this issue.

Could it be some local change in your repository? Can you reproduce it with a fresh checkout?