benfred / py-spy

Sampling profiler for Python programs
MIT License
12.53k stars 414 forks source link

Pure question: about py-spy sampling method #506

Open xwjahahahaha opened 2 years ago

xwjahahahaha commented 2 years ago

I am currently developing a cpu profiler that adapts to multiple languages. In the python part, I expect to use py-spy, but when I am adapting, I find that the profile results of py-spy are strongly related to the acquisition frequency. For example, set the frequency to At 100, the final call stack count is close to 1000 in the 10-second sampling period, and I directly call the rust underlying interface implementation, which is almost the same result (pyspy_snapshot). I understand that the working principle of py-spy is to read the python process memory to get the stack, which is different from other samplers that use the perf_event event of cpu_clock at the bottom (bpf:bcc-profiler, java:async-profiler), so when a When the container has multiple processes in multiple programming languages, using these different samplers to sample at the same time and summarize the results into a flame graph, the results of py-spy do not seem to be very accurate (because the other samplers are in 10 Some call stacks may appear at most 10 times in a second, while py-spy has 1000 times at high frequency, which is unusually wide), because it is not based on perf-event, but based on memory, regardless of whether the python process is occupied or not. cpu. The above is just my personal understanding, I want to know if it is correct, how should I deal with this problem?

Jongy commented 2 years ago

py-spy by default attempts to track only on-CPU processes (this is done unless --idle is given). For non-blocking mode in our experience it doesn't work very well (see https://github.com/benfred/py-spy/issues/480). You can also try to use --gil which GIL-based activity detection.