benfred / py-spy

Sampling profiler for Python programs
MIT License
12.91k stars 431 forks source link

gevent: loop.run() uses most of the flame graph's space #179

Open ojrac opened 5 years ago

ojrac commented 5 years ago

Profiling our web app running gevent gunicorn workers gives us an SVG with a lot of dead space for gevent's event loop:

image

Idle detection properly filters out explicit waits, but a raw 500-sample trace of a low-utilization web server looks like this:

_handle_and_close_when_done (gevent/baseserver.py:26);handle (gunicorn/workers/ggevent.py:169);handle (gunicorn/workers/base_async.py:56);[...more user code...] 1
run (gevent/hub.py:582) 499
_worker (gevent/threadpool.py:270);get (gevent/_threading.py:166);wait (gevent/_threading.py:84) 500

I'm not sure there's a clear fix for this inside py-spy (gevent is explicitly messing with the threading model), but I wanted to run it by you just in case a solution stands out. Ideally, I think the "run" stack frame would be at the top of all the samples.

If that's not doable, this is probably a lot to ask, but it would be helpful to have command-line access to the flamegraph generator in py-spy; we can always set up the original flamegraph .pl scripts, but being able to just pip install one package to generate raw profiles and convert them to SVGs after some filtering would be a huge convenience.

benfred commented 5 years ago

It does seem like we are getting tripped up with gevent's use of greenlets - and that is potentially causing the idle detection code to go awry =(.

For your second point about having command line access to the flamegraph generator - that is possible right now. We have a 'raw' option for the format that will dump out the output that you can pass to flamgraph.pl (usage is something like py-spy record -f raw -o profile.txt --pid $PID and then you can take the resulting profile.txt file and pass to flamegraph.pl after filtering).

It might also be worth looking at the --gil option that will filter our any samples that aren't holding the GIL.

ojrac commented 5 years ago

Thanks for the suggestion; recording with --gil does eliminate the loop.run() samples.

I phrased the flamegraph generator question poorly -- raw output is great, but because py-spy is so easy to set up, it makes me wish I could use py-spy to read the raw output instead of setting up flamegraph.pl. If I was more rust-literate, I'd probably be trying to build that myself.

I'm leaving this open in case you want to track the gevent/greenlet hitch, but --gil solves all our practical problems. Feel free to close, and thanks for the detailed reply.