brendangregg / FlameGraph

Stack trace visualizer
http://www.brendangregg.com/flamegraphs.html
17.38k stars 1.97k forks source link

CPU flamegraphs using perf show wrong sample counts #326

Open jg-linetco opened 10 months ago

jg-linetco commented 10 months ago

Issue

When using the instruction from https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html the number of samples according to perf and the number of samples according to the flame graph differ greatly. Example numbers:

perf record -e cpu-clock -ag -F 199 --call-graph dwarf -- sleep 10
...
[ perf record: Captured and wrote 1.090 MB perf.data (3980 samples) ]

The flame graph reports billions of samples: image

Notes

The example flamegraph on https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html does report correct numbers.

First Analysis

This change has been introduced with https://github.com/brendangregg/FlameGraph/issues/165 in PR https://github.com/brendangregg/FlameGraph/pull/250

From there on, samples are weighted by a factor but still reported as "samples", making the output of perf and flamegraph contradictory.

jg-linetco commented 10 months ago

IMHO we should bring back the old behavior and move the "weighting" behavior behind a command line option.

  1. The scripts should not silently add a weight to the samples and still report them as "samples". This makes the output confusing.
  2. The output of the script now does not match the documentation on the web site.
  3. The weighting is done using the number of cycles elapsed the last sample. I do not think this makes sense in all use cases, especially when dealing with samples that happen irregularly.
  4. The change hides the fact that we are still dealing with a bunch of samples and that we don't know what happens between the samples. Instead it tries to account for cycles, which only makes sense when the time between samples approaches 0.
  5. Other PRs are already trying to get the old behavior back: #297
  6. With older versions, some simple statistic checks were easy: "Is it only one sample? Then it has no significance, let's ignore it". Now we can't discern heavily weighted outliers from multiple, lower weighted samples that carry significant information.

I understand that there's demand to have the samples weighted by the amount the cycle counter increased between the last sample and this sample.

So my proposal would is: Let's bring back the old behavior (sample count) as default behavior and add new command line option to enable the period-weighting of samples.

--periods "Weight samples according to the reported event period"

What do you think? I'll be happy to create the PR. @kgibm already did most of the work in #297 ;-)