Open synapticarbors opened 4 years ago
It's a been a while since I've posted this, so I just wanted to see if anyone had any thoughts about this issue?
Thanks for your patience on this -
It seems like your test program is almost always idle (as shown by running the --idle
option). This means that any false positives in detecting idle status will show up pretty prominently, even if as a percentage of overall samples it's pretty rare. I
The idle detection isn't perfect here - and I've documented some of the issues here https://github.com/benfred/py-spy#how-do-you-detect-if-a-thread-is-idle-or-not for a discussion of the causes. In particular I think you are seeing this issue
Finally, on some Linux calls the ptrace attach that we are using may cause idle threads to wake up momentarily, causing false positives when reading from procfs.
I think a simpler test program would probably just be
import time
time.sleep(1000)
and you'd see roughly similar results. The majority of samples will be discarded - but not all of them on linux, as the code we add to pause the program sends a signal which wakes up the 'sleep' call.
I think this is exacerbated by the irregular sampling feature we added, which I'm considered turning off by default.
I noticed when running py-spy (v0.3.3) against some production code on a Linux machine that around 60% of the time spent was attributed to about half a dozen or so lines in the code that were
time.sleep()
calls. I wrote the following minimal test case to reproduce the issue and discovered a discrepancy between what py-spy produces on Mac vs Linux:(
test.py
)That starts a bunch of threads and at an interval
delta_t
runs a bit of work, where the time required to complete the work is much less thandelta_t
.I then run:
On my mac, I get largely what I expect, that is the flamegraph is dominated by the calls to numpy in the
do_work
function. But running the same code on linux, almost 87% of the time spent, according to the flamegraph, is spent on the calltime.sleep(diff)
(line 29 indo_work
). On the Mac, it attributes just under 6% of the time spent to the same line. I see basically the same if py-spy outputs svg directly instead of raw -> flamegraph -> svg.If I use the
--idle
flag, then py-spy captures almost 100% of the time being spent in thetime.sleep()
on both systems.I've posted a gist with the test code and the svg files at: https://gist.github.com/synapticarbors/38e7fda1d5c23fc80b8562d41b7b672d
I'm aware of some of the previous issues dealing with this (e.g. https://github.com/benfred/py-spy/issues/92), but it seems like this may still be an issue.