CS-FreeStyle / 10000-How-To-Do-in-CS

1 stars 0 forks source link

how to use perf #80

Open liuty10 opened 4 years ago

liuty10 commented 4 years ago

a simple tutorial: http://brendangregg.com/perf.html

More events that you can use: https://relate.cs.illinois.edu/course/cs598apk-f18/f/demos/upload/perf/Using%20Performance%20Counters.html

e.g. resource_stalls.any
[Resource-related stall cycles] resource_stalls.rob
[Cycles stalled due to re-order buffer full] resource_stalls.rs
[Cycles stalled due to no eligible RS entry available] resource_stalls.sb
[Cycles stalled due to no store buffers available. (not including draining form sync)] cycle_activity.cycles_l1d_miss
[Cycles while L1 cache miss demand load is outstanding] cycle_activity.cycles_l1d_pending
[Cycles while L1 cache miss demand load is outstanding] cycle_activity.cycles_l2_miss
[Cycles while L2 cache miss demand load is outstanding] cycle_activity.cycles_l2_pending
[Cycles while L2 cache miss demand load is outstanding] cycle_activity.cycles_ldm_pending
[Cycles while memory subsystem has an outstanding load] cycle_activity.cycles_mem_any
[Cycles while memory subsystem has an outstanding load] cycle_activity.cycles_no_execute
[Total execution stalls] cycle_activity.stalls_l1d_miss
[Execution stalls while L1 cache miss demand load is outstanding] cycle_activity.stalls_l1d_pending
[Execution stalls while L1 cache miss demand load is outstanding] cycle_activity.stalls_l2_miss
[Execution stalls while L2 cache miss demand load is outstanding] cycle_activity.stalls_l2_pending
[Execution stalls while L2 cache miss demand load is outstanding] cycle_activity.stalls_ldm_pending
[Execution stalls while memory subsystem has an outstanding load] cycle_activity.stalls_mem_any
[Execution stalls while memory subsystem has an outstanding load] cycle_activity.stalls_total
[Total execution stalls]

liuty10 commented 3 years ago

sudo sh -c 'echo -1 >/proc/sys/kernel/perf_event_paranoid'

perf stat -e instructions,cycles,LLC-load-misses,LLC-loads sleep 5
perf stat -e instructions,cycles,LLC-load-misses,LLC-loads -t 5212 sleep 5

perf stat -p xxx, stat events on existing process id (comma separated list) -t xxx, stat events on existing thread id (comma separated list) -a, system-wide collection from all CPUs -C, --cpu=xxx Count only on the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. In per-thread mode, this option is ignored. The -a option is still necessary to activate system-wide monitoring. Default is to count on all CPUs. -I msecs, --interval-print msecs Print count deltas every N milliseconds (minimum: 10ms) The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. example: perf stat -I 1000 -e cycles -a sleep 5 --per-socket --per-core --per-thread