benfred / py-spy

Sampling profiler for Python programs
MIT License
12.37k stars 405 forks source link

Support native extensions on ARM #327

Open huonw opened 3 years ago

huonw commented 3 years ago

I'm running py-spy on ARM (a raspberry pi 4B, specifically) quite a bit at the moment, and it works really well, especially when loading it into https://speedscope.app . Thanks for making such a great tool.

It'd be even better if profiling on ARM supported native extensions. I'm sure this is difficult, relying on heavily platform-specific APIs. Some prior art that you may be aware of, and may be relevant is:

benfred commented 3 years ago

I think this is a great idea! Not only will this let us profile native extensions, this will also let us figure out reliably if the thread is idle or not on ARM.

The challenges I see are:

1) We need to get the native stack trace. This should be possible using libunwind with relatively little changes to the x86_64 code we already have in place 2) We need to match the pthread id we have from the python stack trace with the os thread id we will have from the native stack trace. With x86_64 this requires a bit of a hack to peak at the RBX register on the top level frame of the stack: https://github.com/benfred/py-spy/blob/fedd53bd799efdac33d5a41ad56b1bb8047684b4/src/python_spy.rs#L412-L419

One issue for me is that I don't have any ARM hardware to test on =(. It might be possible to emulate this doing something like https://florianmuller.com/raspberry-development-environment-on-macosx-with-qemu though

One other issue with properly supporting ARM is distribution - I've created https://github.com/benfred/py-spy/issues/328 to track getting binary wheels for ARM uploaded to pypi, since I believe this is now supported

huonw commented 3 years ago

We need to get the native stack trace. This should be possible using libunwind with relatively little changes to the x86_64 code we already have in place

Ah, yeah, I see from https://github.com/libunwind/libunwind that it should have pretty reasonable ARM support.

On a somewhat related note, have you seen https://github.com/sfackler/rstack ? In addition to the rstack crate itself which looks similar to remoteprocess, it includes the unwind crate as general purpose bindings to libunwind. Do you have any thoughts about worth sharing more effort/code there rather than having some bindings inline in remoteprocess?

We need to match the pthread id we have from the python stack trace with the os thread id we will have from the native stack trace. With x86_64 this requires a bit of a hack to peak at the RBX register on the top level frame of the stack:

Do you have any hints for how you found this? Experimentation? Reading source code?

huonw commented 3 years ago

(I've opened https://github.com/benfred/remoteprocess/pull/5 as a work-in-progress not-yet-working PR towards ARM unwinding support.)

huonw commented 3 years ago

I've now got benfred/remoteprocess#5 working, and used it to implement py-spy --native on ARM in https://github.com/benfred/py-spy/pull/330.