benfred / py-spy

Sampling profiler for Python programs
MIT License
12.16k stars 401 forks source link

Slow sampling (inside docker)? #559

Closed olejorgenb closed 1 year ago

olejorgenb commented 1 year ago

I haven't managed to investigate properly, but when running py-spy locally I can sample at a rate of 200 using a little over a half core (on a old intel i5.

At my server (newer AMD) running py-spy inside docker, it lags behind at around 30-40 samples/s. The container is restricted to one core though. Running top inside the container, py-spy consumes almost all cpu when sampling from 2 processes at 20/s. (I use the --subprocesses flag to profile a gunicorn app). There's quite a few threads in total, but only two should have significant activity.

I stumbled over https://github.com/joerick/pyinstrument/issues/83 (slow gettimeofday) and was wondering if py-spy could be affected as well. But testing running tons of gettimeofday inside the container give about the same performance as on my work-computer. And seems strange if that should've been the limiting factor at <100samples/s even if the call was slow.

Is there anything I can do to speed up sampling?

Jongy commented 1 year ago

I wouldn't assume the problem is py-spy running in the container - I mean, the limit is probably it, if the container is limited, but not anything else due to py-spy being a container.

You can try:

  1. Run py-spy on the host, where it will have no limits. It can profile the app inside a container when run outside.
  2. py-spy is affected by the number of threads in the app, whether they do actual work or not, because it does a couple of things per thread. This is not something I have a solution for from the top of my head without fixing the code, but lowering the number of threads on your app can help. FWIW in the fork of py-spy that we use in gProfiler, we have slightly improved this by not doing the heavy work for ALL threads, but only for the threads we actually wish to take stacktraces for (we run with --gil). We haven't upstreamed that fix yet, though. See the fix here if you're interested.
olejorgenb commented 1 year ago

Thanks, then I guess it's a combination of many threads and the CPU-quota. Reducing the number of threads is not an option, but I'll see if I can run it outside docker. Not ideal as we like to keep the host machines as clean as possible.

Would an option to limit which threads to profile (by a name pattern) be interesting? I was looking for that before running into this - simply to avoid having to scroll through all the irrelevant threads in speedscope. (or is it possible to avoid most work for idle threads as well as non-gil threads?)

Jongy commented 1 year ago

Would an option to limit which threads to profile (by a name pattern) be interesting? I was looking for that before running into this - simply to avoid having to scroll through all the irrelevant threads in speedscope. (or is it possible to avoid most work for idle threads as well as non-gil threads?)

Yes, this is definitely a viable feature request for py-spy. It already does thread-based filtering (GIL, active state) and it reads the thread name, so yup - it could filter by name and skip the majority of work for threads filtered out.