benfred / py-spy

Sampling profiler for Python programs
MIT License
12.13k stars 401 forks source link

Add support for UCS-2 strings #648

Closed nariman closed 4 months ago

nariman commented 5 months ago

Trying to fix an issue, when py-spy would not parse a stack trace containing functions with cyrillic names.

https://github.com/benfred/py-spy/blob/4a11c44849aa2146634ceef8ee8eb43bc1e969ad/src/python_data_access.rs#L26-L31

Seems like UCS-2 strings are in use by some (maybe all?) pre-built Python binaries (tested on mcr.microsoft.com/devcontainers/rust:1-1-bullseye Dev Container, our company's CentOS environment, as well as my local environment with Python 3.7 and 3.11 versions installed using pyenv).

If there's a function with a UCS-2 encoded name on the first recorded stack trace, py-spy just fails.

Example ```py import time def кириллица(seconds): time.sleep(seconds) if __name__ == "__main__": кириллица(10) ``` Outputs (`profile.svg` will not be created in this case): ```sh $ py-spy record -o profile.svg -- python3 tests/scripts/cyrillic.py Error: Failed to find a python interpreter in the .data section ```

If py-spy successfully started the recorder, we'll get a flamegraph built without stack traces with UCS-2 strings, so final data/graph will look strange or even misleading.

Example ```py import time def function1(seconds): time.sleep(seconds) def кириллица(seconds): time.sleep(seconds) if __name__ == "__main__": function1(10) кириллица(10) ``` Outputs (`profile.svg` will be created): ```sh $ py-spy record -o profile.svg -- python3 tests/scripts/cyrillic.py py-spy> Sampling process 100 times a second. Press Control-C to exit. py-spy> Stopped sampling because process exited py-spy> Wrote flamegraph data to 'profile.svg'. Samples: 12 Errors: 1025 ```