benfred / py-spy

Sampling profiler for Python programs
MIT License
12.53k stars 414 forks source link

py-spy greenlet stacks on Python 3.9 #499

Open SyamGadde opened 2 years ago

SyamGadde commented 2 years ago

Troublemaker back again! py-spy seems to be the only Python statistical profiler that supports profiling Cython extensions and for that I am very grateful. I am again attempting to profile Cython extensions using the --native option and generally doing fine with the latest versions of everything on Python 3.6:

Cython             0.29.30
gevent             21.12.0
greenlet           1.1.2
py-spy             0.3.12

However, when I try the same on Python 3.9, a really annoying thing happens in greenlets. A side-by-side diff is shown below between two stacks as printed by gdb:

...entries deleted...
0x00007f6f3da8902b in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1   0x00007f6f3da8902b in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1
0x00007f6f3d9f408e in PyVectorcall_Call () from /lib64/libpython3.9.so.1.0      0x00007f6f3d9f408e in PyVectorcall_Call () from /lib64/libpython3.9.so.1.0
0x00007f6f2028f2fd in __Pyx_PyObject_Call (kw=,                                 0x00007f6f2028f2fd in __Pyx_PyObject_Call (kw=, 
    arg=(<gevent._gevent_cqueue.Queue at remote 0x7f6f1c8a2dc0>, <gevent._gev       arg=(<gevent._gevent_cqueue.Queue at remote 0x7f6f1c8a2dc0>, <gevent._gev
__pyx_pf_6gevent_17_gevent_cgreenlet_8Greenlet_42run (__pyx_v_self=0x7f6f1d1d   __pyx_pf_6gevent_17_gevent_cgreenlet_8Greenlet_42run (__pyx_v_self=0x7f6f1d1d
    at src/gevent/greenlet.c:14437                                                  at src/gevent/greenlet.c:14437
__pyx_pw_6gevent_17_gevent_cgreenlet_8Greenlet_43run (                          __pyx_pw_6gevent_17_gevent_cgreenlet_8Greenlet_43run (
    __pyx_v_self=<gevent._gevent_cgreenlet.Greenlet at remote 0x7f6f1d1df370>       __pyx_v_self=<gevent._gevent_cgreenlet.Greenlet at remote 0x7f6f1d1df370>
    unused=<optimized out>) at src/gevent/greenlet.c:14338                          unused=<optimized out>) at src/gevent/greenlet.c:14338
0x00007f6f3da89902 in method_vectorcall () from /lib64/libpython3.9.so.1.0      0x00007f6f3da89902 in method_vectorcall () from /lib64/libpython3.9.so.1.0
0x00007f6f3da32b3b in PyObject_Call () from /lib64/libpython3.9.so.1.0          0x00007f6f3da32b3b in PyObject_Call () from /lib64/libpython3.9.so.1.0
0x00007f6f3e3fcbc4 in g_initialstub (mark=<optimized out>) at src/greenlet/gr   0x00007f6f3e3fcbc4 in g_initialstub (mark=<optimized out>) at src/greenlet/gr
0x00007f6f3e3fce6f in g_switch (kwargs=<optimized out>, args=<optimized out>,   0x00007f6f3e3fce6f in g_switch (kwargs=<optimized out>, args=<optimized out>,
    target=0x7f6f1d1df370) at src/greenlet/greenlet.c:692                           target=0x7f6f1d1df370) at src/greenlet/greenlet.c:692
g_switch (target=<optimized out>, args=(), kwargs=0x0) at src/greenlet/greenl   g_switch (target=<optimized out>, args=(), kwargs=0x0) at src/greenlet/greenl
0x00007f6f0f0bf900 in ?? ()                                                   | 0x00007f6f0f062900 in ?? ()
0x00007f6f3e51e040 in ?? ()                                                     0x00007f6f3e51e040 in ?? ()
0x00005578990e19d0 in ?? ()                                                     0x00005578990e19d0 in ?? ()
0x00007f6f3da89410 in ?? () from /lib64/libpython3.9.so.1.0                     0x00007f6f3da89410 in ?? () from /lib64/libpython3.9.so.1.0
0x0000000000000000 in ?? ()                                                     0x0000000000000000 in ?? ()

As you can see above, most of the entries are the same and would be merged together in a flamegraph output from py-spy... EXCEPT for the fact that the stack frame just below the g_switch changes addresses every so often, which results in essentially N independent flame graphs rather than one hierarchical flame graph. Does anyone have guidance for me on why this might happen (only on Python 3.9) and/or how to get py-spy to work with this?

I guess one hacky option would be to have it somehow ignore the line below the last g_switch, or stop at the first frame that has a ?? for the function name, which I can explore, but any other solutions are welcome!

Jongy commented 2 years ago

I'm not a greenlet expert, but as I understand it, g_switch is the "root" of the native stack for greenlets. I don't know how gdb decides "here's where the stack ends and I should stop unwinding". I assume that the way greenlet sets up the stack is different from how standard threads look like, and therefore, gdb doesn't know when to stop. So what you're seeing below g_switch is basically garbage. Not 100% confident in this analysis though.

I think that the best solution here is your first suggestion - cut everything after g_switch.