Closed mdboom closed 3 months ago
It looks like you can get the "raw" period time from perf report
using --show-total-period
and then we can calculate the percentage ourselves to whatever precision we want.
This solution seems to work really well. Here are the results (just for the hexiom benchmark -- not overall, which still takes ~12 hours to compute):
See this PR: https://github.com/faster-cpython/bench_runner/pull/148
Awesome! I wonder if they're just parsing /proc/self/maps
to find our JIT pages.
Very nice.
Perf just assumes mmaped executable pages are "JIT"? That's handy if it does.
After a first run of using Linux perf to profile pyperformance on a JIT-enabled build of CPython, it seems that JIT frames show up as distinct from everything else, which is great. We get a bunch of entries that look like this:
(I assume that's some sort of unique id, and a pointer to the code page??)
Normal C function frames, for example, look like this:
So we are actually able to "see" when JIT code is the inner most frame when perf collects a sample.
However, I think it's not quite working right. Whenever using sampling based profiling, there is always a certain percentage of the overall time that is "unaccounted for", because there is a long tail of call sites that are reported as 0.00%, but nonetheless take some non-zero time. As far as I can gather
linux perf
has a fixed precision of 100ths of a percent. Traditionally in our data the unaccounted for was 1-2% of the time. However, with a JIT run, 1.5% of the time is in these[JIT]
frames, but the unaccounted for time jumps to 12%. So I think what is happening is we have this now extremely long tail of individual[JIT]
functions that aren't actually getting accounted for, and the time spent in the JIT is actually much higher than 1.5%.I'm not sure how to solve this. Maybe there's some way to "coerce" all of the
[JIT]
entries into one. If not, I would hate to have to fork theperf report
to get better precision out and/or write my own perf data parser. (Though maybe the perf data format is simple enough, I have no idea).Cc @brandtbucher