andikleen / pmu-tools

Intel PMU profiling tools
GNU General Public License v2.0
2.04k stars 340 forks source link

toplev: drilldown misclassified Level6 of Pause kernel #376

Open aayasin opened 3 years ago

aayasin commented 3 years ago

For some reason, with latest pmu-tools (TMA version 4.2), toplev's drilldown does not detect Pause at level 6 on ICX. It prints this error: Mismeasured (out of bound values): Slow_Pause However, with out multiplexing toplev is doing it right. Below reproducer using perf-tools with output of system details and the two toplev runs - look for '**'.

## ./do.py build profile -g "-i PAUSE -n 3" -a pause3x -ki 24e6 --tune :sample:0 --profile-mask 0xc1
building kernel: pause3x ..
icelake
logging setup ..

**topdown auto-drilldown ..** 
# 4.2-full-perf on Genuine Intel(R) CPU $0000%@ [icl/icelake]
BE             Backend_Bound   % Slots                   87.7  <==
Info.Thread    IPC               Metric                   0.0 
Info.System    Time              Seconds                  1.0 
Rerunning workload
BE             Backend_Bound             % Slots                   87.7 
Info.Core      CoreIPC                     CoreMetric               0.1 
BE/Core        Backend_Bound.Core_Bound  % Slots                   87.2  <==
Info.Thread    IPC                         Metric                   0.0 
Info.System    Time                        Seconds                  1.0 
MUX                                      %                         49.1 
Rerunning workload
BE             Backend_Bound                               % Slots                   87.4 
Info.Core      CoreIPC                                       CoreMetric               0.1 
BE/Core        Backend_Bound.Core_Bound                    % Slots                   86.9 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization  % Clocks                  98.4  <==
Info.Thread    IPC                                           Metric                   0.0 
Info.System    Time                                          Seconds                  1.0 
MUX                                                        %                         29.8 
Rerunning workload
BE             Backend_Bound                                                % Slots                   87.4 
Info.Core      CoreIPC                                                        CoreMetric               0.1 
BE/Core        Backend_Bound.Core_Bound                                     % Slots                   86.9 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                   % Clocks                  98.6 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0  % Clocks                  93.4  <==
Info.Thread    IPC                                                            Metric                   0.0 
Info.System    Time                                                           Seconds                  1.0 
MUX                                                                         %                         29.4 
Rerunning workload
BE             Backend_Bound                                                                      % Slots                   87.7 
Info.Core      CoreIPC                                                                              CoreMetric               0.1 
BE/Core        Backend_Bound.Core_Bound                                                           % Slots                   87.2 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                         % Clocks                  98.3 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                        % Clocks                  93.1 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation  % Clocks                  90.2  <==
Info.Thread    IPC                                                                                  Metric                   0.0 
Info.System    Time                                                                                 Seconds                  1.0 
MUX                                                                                               %                         19.6 
Rerunning workload
BE             Backend_Bound                                                                      % Slots                   87.3 
Info.Core      CoreIPC                                                                              CoreMetric               0.1 
BE/Core        Backend_Bound.Core_Bound                                                           % Slots                   86.9 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                         % Clocks                  98.3 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                        % Clocks                  93.1 
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation  % Clocks                  89.5  <==
Info.Thread    IPC                                                                                  Metric                   0.0 
Info.System    Time                                                                                 Seconds                  1.0 
MUX                                                                                               %                         19.5 
Mismeasured (out of bound values): Slow_Pause

**topdown full no multiplexing ..**
Using level 6.
# 4.2-full-perf on Genuine Intel(R) CPU $0000%@ [icl/icelake]
FE             Frontend_Bound                                                                                % Slots                       3.2 <
BE             Backend_Bound                                                                                 % Slots                      87.8  
RET            Retiring                                                                                      % Slots                      11.6 <
Info.Inst_Mix  Instructions                                                                                    Count             147,791,337.0  
FE             Frontend_Bound.Fetch_Latency                                                                  % Slots                       3.7 <
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                      87.0  
RET            Retiring.Light_Operations                                                                     % Slots                       5.8 <
RET            Retiring.Heavy_Operations                                                                     % Slots                       5.8 <
FE             Frontend_Bound.Fetch_Latency.MS_Switches                                                      % Clocks                     14.2  
FE             Frontend_Bound.Fetch_Bandwidth.DSB                                                            % Slots_est                   3.2 <
Info.Thread    IpTB                                                                                            Metric                      6.1  
Info.Memory    L2MPKI                                                                                          Metric                      0.2  
BE/Mem         Backend_Bound.Memory_Bound.DRAM_Bound.MEM_Latency                                             % Clocks                     33.1 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                     98.3  
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                     93.3  
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_1                                   % Clocks                      5.2 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation             % Clocks                     90.3  
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation.Slow_Pause  % Clocks                    329.4   <==
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization               % Core_Execution              3.2 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_5        % Core_Clocks                 4.8 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_6        % Core_Clocks                 6.9 <
Info.System    CPU_Utilization                                                                                 Metric                      1.0  
RET            Retiring.Light_Operations.Other_Light_Ops                                                     % Uops                      100.0 <
RET            Retiring.Heavy_Operations.Microcode_Sequencer                                                 % Slots                       5.8 <
Info.Thread    UPI                                                                                             Metric                      5.7  
Info.System    Time                                                                                            Seconds                     1.0  
Info.Core      CORE_CLKS                                                                                       Count           1,522,860,504.0  
Info.Inst_Mix  IpLoad                                                                                          Metric                    175.8  
Info.Inst_Mix  IpCall                                                                                          Metric                  2,428.2  
Info.Inst_Mix  BpTkBranch                                                                                      Metric                      1.0  
MUX                                                                                                          %                           100.0  
3 results not referenced:  12 13 14

**## cat setup-system.log** 
Linux ha01wvaw0960-target-icx 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Module                  Size  Used by
kvm_intel             282624  0
kvm                   663552  1 kvm_intel

perf version 5.4.73
/proc/sys/kernel/nmi_watchdog : 0 
/proc/sys/kernel/soft_watchdog : 0 
/proc/sys/kernel/kptr_restrict : 0 
/proc/sys/kernel/perf_event_paranoid : -1 
/proc/sys/kernel/perf_event_mlock_kb : 60000 
/sys/devices/cpu/perf_event_mux_interval_ms : 100 

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 0 size: 31872 MB
node 0 free: 5365 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 1 size: 32223 MB
node 1 free: 3923 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10