andikleen / pmu-tools

Intel PMU profiling tools
GNU General Public License v2.0
1.98k stars 331 forks source link

TMA/toplev misses flagging tree node crossing its theshold #444

Closed aayasin closed 1 year ago

aayasin commented 1 year ago

The issue is noticed when tagging/sibling is in-use.

Here is a reproducer with perf-tools, with latest toplev of TMA 4.4 on SPR and on a MOVLG1700 kernel. <== next to the MITE node is missing.

Filing this entry to track fixing it (hopefully in TMA 4.5).

[ayasin@icl-spr-01 perf-tools]$ N=1700; ./do.py build prof-no-mux -g "-i MOVLG -n $N" -a MOVLG$N -ki 75e6 -v1
building kernel: MOVLG1700 ..
/usr/bin/python ./kernels/gen-kernel.py -i MOVLG -n 1700 > ./kernels/MOVLG1700.c
gcc -O2 -ffast-math -g -o ./kernels/MOVLG1700 ./kernels/MOVLG1700.c 2>&1
INFO: App: taskset 0x4 ./kernels/MOVLG1700 75000000 .
topdown-vl6 no multiplexing ..

/usr/bin/python ./pmu-tools/toplev.py --no-desc -vl6 --no-multiplex --nodes '+CoreIPC,+Instructions,+CORE_CLKS,+Time,-CPU_Utilization,+Load_Miss_Real_Latency,+L2MPKI,+ILP,+IpTB,+IpMispredict,+Memory_Bound*/3,+Mispredictions,+IpTB,+BpTkBranch,+IpCall,+IpLoad,+ILP,+UPI' -V MOVLG1700-75e6.toplev-vl6-nomux-perf.csv --metric-group +Summary --single-thread -- taskset 0x4 ./kernels/MOVLG1700 75000000 2>&1 | tee MOVLG1700-75e6.toplev-vl6-nomux.log | egrep -iv '^((FE|BE|BAD|RET).*[ \-][10]\.. |Info.* 0\.0[01]? |RUN|Add|warning:)|not (found|referenced|supported)| < [\[\+]|<$'

# 4.4-full-perf on Intel(R) Xeon(R) Platinum 8480+ [spr/sapphire_rapids]
FE             Frontend_Bound                                                                                  % Slots                         46.7
Info.Core      CoreIPC                                                                                           Core_Metric                    3.20
Info.Inst_Mix  Instructions                                                                                      Count            127,854,243,713
Info.Inst_Mix  IpCall                                                                                            Inst_Metric               65,971.8
Info.Inst_Mix  IpTB                                                                                              Inst_Metric                1,559.3
Info.Inst_Mix  BpTkBranch                                                                                        Metric                         1.05
FE             Frontend_Bound.Fetch_Bandwidth                                                                  % Slots                         46.3
Info.Bad_Spec  IpMispredict                                                                                      Inst_Metric            4,218,776.6
FE             Frontend_Bound.Fetch_Bandwidth.MITE                                                             % Slots_est                     49.4
Info.Memory    Load_Miss_Real_Latency                                                                            Clocks_Latency               132.31
Info.Core      ILP                                                                                               Core_Metric                    3.21
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization                 % Core_Execution                63.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_0          % Core_Clocks                   59.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_1          % Core_Clocks                   60.0
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_6          % Core_Clocks                   69.9
Info.Thread    UPI                                                                                               Metric                         1.00
Info.Thread    IPC                                                                                               Metric                         3.20
Info.System    Time                                                                                              Seconds                       10.54
Info.Core      CORE_CLKS                                                                                         Count             39,946,651,913
Info.Inst_Mix  IpLoad                                                                                            Inst_Metric                7,447.8
MUX                                                                                                            %                              100.00
andikleen commented 1 year ago

I think this is fixed with c251350be69165

aayasin commented 1 year ago

Right; I verified it is fixed with 4.6 version:

N=1700; ./do.py build prof-no-mux -g "-i MOVLG -n $N" -a MOVLG$N -ki 75e6 -v1
building kernel: MOVLG1700 ..
/usr/bin/python ./kernels/gen-kernel.py -i MOVLG -n 1700 > ./kernels/MOVLG1700.c
gcc -O2 -ffast-math -g -o ./kernels/MOVLG1700 ./kernels/MOVLG1700.c 2>&1
INFO: App: taskset 0x4 ./kernels/MOVLG1700 75000000 .
topdown-vl6 no multiplexing ..
# 4.6-full-perf on Intel(R) Xeon(R) Platinum 8454H [spr/sapphire_rapids]
FE             Frontend_Bound                                                                                  % Slots                           46.7
Info.Core      CoreIPC                                                                                           Core_Metric                      3.20
Info.Inst_Mix  Instructions                                                                                      Count              127,739,311,378
Info.Inst_Mix  IpCall                                                                                            Inst_Metric                330,355.7
Info.Inst_Mix  IpTB                                                                                              Inst_Metric                  1,673.5
Info.Inst_Mix  BpTkBranch                                                                                        Metric                           1.01
FE             Frontend_Bound.Fetch_Bandwidth                                                                  % Slots                           46.3
Info.Bottleneck Mispredictions                                                                                   Scaled_Slots                    -0.02
Info.Bad_Spec  IpMispredict                                                                                      Inst_Metric              6,357,087.3
FE             Frontend_Bound.Fetch_Bandwidth.MITE                                                             % Slots_est                       49.4   <==
Info.Memory    Load_Miss_Real_Latency                                                                            Clocks_Latency                 111.84
Info.Core      ILP                                                                                               Core_Metric                      3.21
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization                 % Core_Execution                  64.0
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_0          % Core_Clocks                     59.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_1          % Core_Clocks                     60.1
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_6          % Core_Clocks                     69.7
Info.Thread    UopPI                                                                                             Metric                           1.00
Info.Pipeline  IpAssist                                                                                          Inst_Metric          1,120,520,275.2
Info.Thread    IPC                                                                                               Metric                           3.20
Info.System    CPUs_Utilized                                                                                     Metric                         128.00
Info.System    Time                                                                                              Seconds                         11.77
Info.Core      CORE_CLKS                                                                                         Count               39,915,266,939
Info.Inst_Mix  IpLoad                                                                                            Inst_Metric                 34,579.4
MUX                                                                                                            %                                100.00
WARNING: file is missing: MOVLG1700-75e6.perf_stat-r3.log
wrote: MOVLG1700-75e6.SPR.stat