Closed njsmith closed 5 years ago
Thanks for the links! This shouldn't be all that hard to add (though we are doing more than just sleeping for 10ms between samples to get a 100 samples/second - we also correct for the time it takes to generate the sample, and report warnings if we're not keeping up).
In the meantime if anyone is running into this problem in practice, I'd suggest just raising the sampling rate: if you know that you have some work happening at a certain rate - you should be sampling at least 2x as fast to avoid aliasing effects iirc.
we are doing more than just sleeping for 10ms between samples to get a 100 samples/second - we also correct for the time it takes to generate the sample, and report warnings if we're not keeping up
I think that actually makes this issue worse, because it means you're removing the random scheduling jitter that might let your sampling drift out of sync with other clock ticks and get more accurate results :-).
The "keeping up" issue is an interesting one though. With the exponentially-distributed inter-sample intervals, yeah, in theory you should compute each exponential step from the previous one, which means that there's some chance that it will tell you you should take two samples 0.0001 ms apart, which is obviously not going to work. And as the target sampling rate increases, this happens more often – but it can happen at any rate, so the mere fact of it happening wouldn't mean that the user did anything wrong.
If you use the naive loop I wrote, then I guess you'll get a kind of close-cousin of a poisson process, where after each sample there's a short "dead zone" where it's impossible to get a second sample, and then after this we go back to poisson sampling. But intuitively, the dead zones are also distributed randomly, so they probably don't hurt anything, and just mean we get fewer samples than we were hoping for? You might want to correct for this by somehow upping the sampling rate slightly. What makes this tricky is that as the sampling rate increases, then the dead zones end up taking up a larger and larger proportion of the time, which means you want to increase the sampling rate even more to compensate... and obviously it totally breaks down when the target rate gets so high that you're basically just sampling in a tight loop with no sleeps.
Of course, well before it reaches that point you're totally destroying the target program's performance, and thus probably destroying the data you were trying to collect. (Even if you're using --nonblocking
, I guess cache effects will still probably cause lots of interference.)
Maybe the best solution is to use the naive loop, plus warn whenever py-spy is spending more than X% of its time sampling instead of sleeping, for some X.
you should be sampling at least 2x as fast to avoid aliasing effects iirc.
You're thinking of the Nyquist theorem, which is about reconstructing smooth analog signals from point samples. In this case, increasing sample rate alone won't necessarily help – for example, in our example where the program does something for 1 ms every 10 ms, then sampling every 5 ms will either tell you that this operation takes 50% of the time (if you sample at 0, 5, 10, ...), or 0% of the time (if you sample at 2, 7, 12, ...). This is also why this problem can be sneaky: rhythmic behavior in the program-under-test doesn't have to exactly match py-spy's sampling rate to cause distorted results; I think it can happen any time the two rates share any common factors (or approximately share common factors).
2 things:
first, this tool fills my heart with gladness. second, I want to highlight something @njsmith recommended:
A very simple thing that would help is to change the default sampling rate to a prime number.
Changing the default from 100 to e.g. 109, like he recommended, would likely go a long way to avoid the sampling bias. This is the same reason most examples using perf
use -F 99
; you don't want to accidentally be locked in time to events that are happening on the system.
I'm not sure whether the default would be to always use an exponential distribution or not, but if that becomes a flag, it would be nice to change the 100 Hz to something else.
Thanks for all the information @njsmith! It was great reading your explanation, and I appreciate you taking the time to raise this.
I have an initial implementation here: #108
For the 'keeping up' problem, I'm using the exponential distribution to define an ideal schedule of when we should sample, and then sleeping the appropriate amount of time to try to keep to that schedule as best as possible. If one sample should have happened in the past - we don't sleep at all on that loop, and the amount of sleep time on the next sample will be reduced until we are caught up.
in v0.1.11
Suppose we have a program, that has an internal time tick: 100 times per second, it does some operation that takes 1 millisecond. Altogether, this operation takes up 10% of runtime.
Suppose we use py-spy to profile our program, with py-spy's default sampling rate of 100 samples/second. There are two possibilities. If, by chance, py-spy and our program's internal ticks line up, then every time py-spy takes a sample our program will be performing our operation, and py-spy will report that the operation uses 100% of runtime. Alternatively, the ticks won't line up, so e.g. our operation happens at times 0 ms, 10 ms, 20 ms, etc., while py-spy samples at 5 ms, 15 ms, 25 ms, etc. Then py-spy will report that the operation uses 0% of runtime. The one thing that definitely won't happen is for py-spy to tell us "10%" like we want.
This kind of aliasing can happen whenever the program being profiled has regular activity on intervals that share some common factor with py-spy's sampling rate.
A very simple thing that would help is to change the default sampling rate to a prime number. Probably lots of programs have internal ticks that are some multiple or divisor of 10 ms. Not as many programs have internal ticks that are a multiple or divisor of 9.1743 ms (= 109 samples/second).
This does still have two downsides: (a) well, what if a program does have 9.1 ms internal ticks? (possibly by accident, e.g. because it does some loop where each iteration just happens to need 9.1 ms of CPU), (b) if some program has ticks that are approximately the same as py-spy's (e.g. 10 ms vs 9.1 ms), then the two sample rates will gradually drift in and out of sync (think turn signals), and you'll need to keep recording samples for a long time for this to average out. I think these are unavoidable in any fixed-sampling-rate strategy.
Fortunately, there is a really slick alternative based on the poisson process.
Ideally, we want our profiler to have an equal chance of sampling at any time. Also, we don't want to do too many samples per second that we overwhelm the computer. So you can imagine one sort of naive approach we could do. We could split up a second into a bunch of tiny, equal-sized pieces (say, 1 ms each), and every 1 ms we flip a biased coin. 10% of the time we take a sample, and the other 90% of the time, we skip it. This seems like it's on the right track... clearly each 1 ms period has the same chance of being sampled. And we're still only doing about 1 sample every 10 ms on average (though the exact interval will vary). But, there's an awkward problem: we want to split up time into really tiny pieces to make sure we have a chance of seeing everything, but waking up over and over just to flip a coin and then go back to sleep is really wasteful.
Time for some :sparkles:probability theory:sparkles:. Let's imagine we can split up time into smaller and smaller pieces. (And we keep adjusting the coin so that each piece is less and less likely to be sampled, so that on average we still have ~1 sample every 10 ms.) If we keep going until the pieces are infinitely small, then this is called a "poisson process". And [... several pages of calculus omitted ...] it turns out that in a poisson process that has N samples/second, the distribution between events has an exponential distribution with
lambda = N
.So, tl;dr: if you want to average 100 samples per second, but scatter them so that they have an equal chance of happening at any time, then all you have to do is write a loop like:
Thanks for visiting my blog, please like and subscribe.