andikleen / pmu-tools

Intel PMU profiling tools
GNU General Public License v2.0
1.97k stars 330 forks source link

No cmask in perf output. #511

Open jestrang opened 2 months ago

jestrang commented 2 months ago

The issue is when a cmask is used, the event name is appended the mask and mask-value. Users with same event but multiple masks, won't now which event is with what mask. The request is to add this mask so that uses are able to tell what data values for events go to which masks.

For example _perf stat -r3 --log-fd=1 -e 'cpu-clock,{cpu/slots,name=topdown_slots/,instructions,cycles,ref-cycles,cpu/topdown-retiring,name=perf_metrics_retiring/,cpu/topdown-bad-spec,name=perf_metrics_bad_speculation/,cpu/topdown-fe-bound,name=perf_metrics_frontend_bound/,cpu/topdown-be-bound,name=perf_metrics_backend_bound/},cpu/event=0xc4,umask=0x40,name=system-entries/u,r2424,cpu/event=0x79,umask=0x8,cmask=1,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=2,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=3,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=4,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=5,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=6,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=7,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=8,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=9,name=idq_dsb_uops/,cpu/event=0x79,umask=0x8,cmask=10,name=idq_dsbuops/,context-switches,cpu-migrations,page-faults,branches,branch-misses,cycles:k' -- openssl speed -seconds 5 rsa2048

This command that has 10 cmasks for the same event idq_dsb_uops, wont show in the output what mask goes to which event value. See output below.

 Performance counter stats for 'openssl speed -seconds 5 rsa2048' (3 runs):

         10,004.10 msec cpu-clock                 #    1.000 CPUs utilized            ( +-  0.00% )
   114,832,895,766      topdown_slots             #   11.478 G/sec                    ( +-  0.00% )  (49.89%)
    68,529,997,310      instructions              #    2.98  insn per cycle           ( +-  0.02% )  (49.89%)
    22,966,579,153      cycles                    #    2.296 GHz                      ( +-  0.00% )  (49.89%)
    22,966,579,061      ref-cycles                #    2.296 G/sec                    ( +-  0.00% )  (49.89%)
    77,455,914,006      perf_metrics_retiring     #    7.742 G/sec                    ( +-  0.77% )  (49.89%)
     5,403,900,976      perf_metrics_bad_speculation #  540.155 M/sec                    ( +- 12.73% )  (49.89%)
     8,556,176,546      perf_metrics_frontend_bound #  855.245 M/sec                    ( +-  0.00% )  (49.89%)
    23,829,732,392      perf_metrics_backend_bound #    2.382 G/sec                    ( +-  0.27% )  (49.89%)
             6,245      system-entries            #  624.228 /sec                     ( +-  0.36% )  (56.17%)
            35,843      r2424                     #    3.583 K/sec                    ( +-  1.52% )  (56.21%)
    14,779,255,751      idq_dsb_uops:c1           #    1.477 G/sec                    ( +-  0.05% )  (56.25%)
    14,684,356,923      idq_dsb_uops              #    1.468 G/sec                    ( +-  0.05% )  (56.30%)
    14,164,818,067      idq_dsb_uops              #    1.416 G/sec                    ( +-  0.05% )  (56.34%)
    12,857,163,213      idq_dsb_uops              #    1.285 G/sec                    ( +-  0.06% )  (25.07%)
    11,839,945,040      idq_dsb_uops              #    1.183 G/sec                    ( +-  0.06% )  (25.03%)
     8,373,036,971      idq_dsb_uops              #  836.939 M/sec                    ( +-  0.06% )  (24.99%)
                 0      idq_dsb_uops              #    0.000 /sec                     (24.96%)
                 0      idq_dsb_uops              #    0.000 /sec                     (24.95%)
                 0      idq_dsb_uops              #    0.000 /sec                     (24.95%)
                 0      idq_dsb_uops              #    0.000 /sec                     (24.95%)
                10      context-switches          #    1.000 /sec                     ( +- 12.02% )
                 0      cpu-migrations            #    0.000 /sec
               260      page-faults               #   25.989 /sec
     3,113,248,792      branches                  #  311.189 M/sec                    ( +-  0.05% )  (31.18%)
         3,646,519      branch-misses             #    0.12% of all branches          ( +-  9.03% )  (37.42%)
        11,827,363      cycles:k                  #    0.001 GHz                      ( +-  0.70% )  (43.65%)

         10.004829 +- 0.000164 seconds time elapsed  ( +-  0.00% )