Open SimonSapin opened 3 years ago
Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread
That’s unfortunate. Can you say more about this merging? Does it need to happen?
Indeed that would be very helpful to have py-spy handle native threads in the reporting to understand the performance of CPU intensive Python programs that use datascience libraries like numpy that rely on multi-threaded linear algebra native libraries such as OpenBLAS, MKL and co.
Same for machine learning libraries like scikit-learn, lightgbm and xgboost that use OpenMP threads in the CPU intensive sections of the code written in Cython or C++.
At the moment profiling with py-spy --native --threads --format speedscope
and loading the results into the speedscope visualizer makes no sense to me...
We're using libunwind-ptrace in PyPerf and we just place native frames on top of the Python frames (stopping at the first native frame that is the PyEval_EvalFrame*
which belong to the topmost Python function). For a truly native thread with no Python frames, we will just have its native stack.
IIRC py-spy uses libunwind-ptrace as well? So this rather simple scheme could work.
Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread
@benfred It would be great to have native thread in py-spy: in my case, some of those native threads are managed by OpenMP via Cython prange
loops: in this case they can call Cython functions and py-spy Cython support would be very handy.
Furthermore, if speedscope ever supports multitrack views with time-aligned traces, it would be very helpful to understand when those native threads come into play and interact with the calling Python code.
Would @Jongy's suggested solution above work?
Does
py-spy record
ignore threads that don’t contain any Python stack frame by default?I have a Python program with a native extension (that happens to be written in Rust). That extension starts a thread (with Rust’s
std::thread::spawn
) to do some CPU-intensive work in parallel with other work. The child thread never runs a Python interpreter. The SVG output of the profiler is missing everything in the second thread.--native
does show Rust stack frames, but only in the parent thread. Adding--threads
adds the ID of the parent thread to the output but nothing else. Adding--idle
doesn’t seem to change anything for this program.When using
py-spy dump --pid
(at the right time) however, the stack of both threads is printed correctly.Can I use py-spy to profile both threads?