Closed olejorgenb closed 1 year ago
I wouldn't assume the problem is py-spy running in the container - I mean, the limit is probably it, if the container is limited, but not anything else due to py-spy being a container.
You can try:
--gil
). We haven't upstreamed that fix yet, though. See the fix here if you're interested.Thanks, then I guess it's a combination of many threads and the CPU-quota. Reducing the number of threads is not an option, but I'll see if I can run it outside docker. Not ideal as we like to keep the host machines as clean as possible.
Would an option to limit which threads to profile (by a name pattern) be interesting? I was looking for that before running into this - simply to avoid having to scroll through all the irrelevant threads in speedscope. (or is it possible to avoid most work for idle threads as well as non-gil threads?)
Would an option to limit which threads to profile (by a name pattern) be interesting? I was looking for that before running into this - simply to avoid having to scroll through all the irrelevant threads in speedscope. (or is it possible to avoid most work for idle threads as well as non-gil threads?)
Yes, this is definitely a viable feature request for py-spy. It already does thread-based filtering (GIL, active state) and it reads the thread name, so yup - it could filter by name and skip the majority of work for threads filtered out.
I haven't managed to investigate properly, but when running py-spy locally I can sample at a rate of 200 using a little over a half core (on a old intel i5.
At my server (newer AMD) running py-spy inside docker, it lags behind at around 30-40 samples/s. The container is restricted to one core though. Running top inside the container, py-spy consumes almost all cpu when sampling from 2 processes at 20/s. (I use the --subprocesses flag to profile a gunicorn app). There's quite a few threads in total, but only two should have significant activity.
I stumbled over https://github.com/joerick/pyinstrument/issues/83 (slow gettimeofday) and was wondering if py-spy could be affected as well. But testing running tons of gettimeofday inside the container give about the same performance as on my work-computer. And seems strange if that should've been the limiting factor at <100samples/s even if the call was slow.
Is there anything I can do to speed up sampling?