andikleen / pmu-tools

Intel PMU profiling tools
GNU General Public License v2.0
1.98k stars 331 forks source link

toplev should not sum up Time metric when --global is used #453

Open aayasin opened 1 year ago

aayasin commented 1 year ago

Here is output for a 5 seconds multicore run where toplev reports 80 seconds.

ayasin@icl-adl-01:/iusers/ayasin/perf-tools$ ./do.py profile --perf /iusers/ayasin/perf --toplev-args ' --global' --tune :sample:3 ":perf-pebs:'-b -e cpu_core/event=0xc5,umask=0x11,name=BR_MISP_RETIRED.COND/ppp -c 20003'" :perf-pebs-top:-1 -a './adobe-HQ.sh pgo1 t73' -pm 12 -v1
INFO: App: ./adobe-HQ.sh pgo1 t73 .

per-app counting 3 runs ..
/iusers/ayasin/perf stat -r3 --log-fd=1 --td-level=2 -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,cycles:k,{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/},cpu_core/event=0xc4,umask=0x40,name=System-entries/u,r2424,r0160,r0262" -- ./adobe-HQ.sh pgo1 t73 | tee adobe-HQ-pgo1-t73.perf_stat-r3.log | egrep 'seconds [st]|CPUs|GHz|insn|topdown|Work|System|all branches' | uniq
         77,336.34 msec cpu-clock                 #   15.745 CPUs utilized            ( +-  2.65% )
   419,131,098,203      cpu_core/topdown-retiring/ #    5.556 G/sec                    ( +-  1.58% )
   195,044,038,033      cpu_core/topdown-bad-spec/ #    2.586 G/sec                    ( +-  0.98% )
    92,150,614,839      cpu_core/topdown-fe-bound/ #    1.222 G/sec                    ( +-  3.74% )
   110,879,392,005      cpu_core/topdown-be-bound/ #    1.470 G/sec                    ( +- 12.46% )
    16,877,205,681      cpu_core/topdown-heavy-ops/ #  223.734 M/sec                    ( +-  2.74% )
   193,658,937,149      cpu_core/topdown-br-mispredict/ #    2.567 G/sec                    ( +-  1.00% )
    59,771,314,506      cpu_core/topdown-fetch-lat/ #  792.364 M/sec                    ( +-  3.72% )
    54,177,283,436      cpu_core/topdown-mem-bound/ #  718.206 M/sec                    ( +- 20.09% )
         4,747,438      System-entries            #   62.935 K/sec                    ( +-  8.44% )
             4.912 +- 0.158 seconds time elapsed  ( +-  3.23% )

topdown full tree + All Bottlenecks ..
PERF=/iusers/ayasin/perf /usr/bin/python /iusers/ayasin/perf-tools/pmu-tools/toplev.py --cputype=core --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Memory_Bandwidth,+Memory_Latency,+Memory_Data_TLBs,+Core_Bound_Likely' -V adobe-HQ-pgo1-t73.toplev-vl6-perf.csv --global --tune 'DEDUP_NODE = "Other_Light_Ops,Lock_Latency,Contested_Accesses,Data_Sharing,FP_Arith"' -- ./adobe-HQ.sh pgo1 t73 2>&1 | tee adobe-HQ-pgo1-t73.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort
BAD            Bad_Speculation.Branch_Mispredicts                                                              % Slots                           24.4    [ 4.0%]<==
Info.Botlnk.L0 Core_Bound_Likely                                                                                 Metric                           0.00
Info.Botlnk.L2 DSB_Misses                                                                                        Scaled_Slots                     2.51   [ 2.1%]
Info.Bottleneck Big_Code                                                                                         Scaled_Slots                     1.01   [ 3.8%]
Info.Bottleneck Branching_Overhead                                                                               Scaled_Slots                     7.93   [ 3.8%]
Info.Bottleneck Instruction_Fetch_BW                                                                             Scaled_Slots                     5.33   [ 3.8%]
Info.Bottleneck Memory_Bandwidth                                                                                 Scaled_Slots                     1.75   [ 3.8%]
Info.Bottleneck Memory_Data_TLBs                                                                                 Scaled_Slots                     0.90   [ 3.8%]
Info.Bottleneck Memory_Latency                                                                                   Scaled_Slots                     0.14   [ 3.8%]
Info.Bottleneck Mispredictions                                                                                   Scaled_Slots                    27.60   [ 3.8%]
Info.System    Time                                                                                              Seconds                         79.94
MUX                                                                                                            %                                  2.00
warning: 5 nodes had zero counts: Contested_Accesses Core_Bound_Likely DSB_Switches Data_Sharing UopPI
ERROR: Too many metrics with zero counts ( Contested_Accesses Core_Bound_Likely DSB_Switches Data_Sharing UopPI). Run longer or use: --toplev-args ' --no-multiplex' !