benfred / py-spy

Sampling profiler for Python programs
MIT License
12.87k stars 431 forks source link

Fail to find interpreter with Standalone Python and shapely.geos #709

Open jack-zhang-ai opened 1 month ago

jack-zhang-ai commented 1 month ago

We are hitting the following issue when trying to py-spy while using a Python distribution from python-build-standalone and importing shapely.geos from shapely==1.8.5.

$ sudo env "PATH=$PATH" RUST_LOG=info py-spy dump --pid 148674
[2024-10-18T23:25:09.297598685Z INFO  py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 15469221632486072992 }), vals: [[AnyValue { inner: TypeId { t: 15469221632486072992 } }]], raw_vals: [["148674"]], ignore_case: false }}, subcommand: None } }) }
[2024-10-18T23:25:09.297957181Z INFO  py_spy::python_spy] Got virtual memory maps from pid 148674:
[2024-10-18T23:25:09.298431475Z INFO  py_spy::python_spy] Found libpython binary @ /python_standalone_testing/python/lib/libpython3.9.so.1.0
[2024-10-18T23:25:09.311940673Z INFO  py_spy::python_spy] got symbol Py_GetVersion.version (0x00007f23d9445860) from libpython binary
[2024-10-18T23:25:09.311949009Z INFO  py_spy::python_spy] Getting version from symbol address
[2024-10-18T23:25:09.312385612Z INFO  py_spy::python_spy] Getting version from python binary BSS
[2024-10-18T23:25:09.312401592Z INFO  py_spy::python_spy] Failed to get version from BSS section: failed to find version string
[2024-10-18T23:25:09.312406531Z INFO  py_spy::python_spy] Getting version from libpython BSS
[2024-10-18T23:25:09.312561654Z INFO  py_spy::python_spy] Failed to get version from libpython BSS section: failed to find version string
[2024-10-18T23:25:09.312578105Z INFO  py_spy::python_spy] Trying to get version from path: /python_standalone_testing/python/bin/python3.9
[2024-10-18T23:25:09.312583014Z INFO  py_spy::python_spy] python version 3.9.0 detected
[2024-10-18T23:25:09.312587993Z INFO  py_spy::python_spy] got symbol _PyRuntime (0x00007f23d944c358) from libpython binary
[2024-10-18T23:25:09.312631836Z WARN  py_spy::python_spy] Interpreter address from _PyRuntime symbol is invalid 000000240000080e
[2024-10-18T23:25:09.312636394Z INFO  py_spy::python_spy] Failed to get interp_head from symbols, scanning BSS section from main binary
[2024-10-18T23:25:09.312645752Z INFO  py_spy::python_spy] Failed to get interpreter from binary BSS, scanning libpython BSS
Error: Failed to find a python interpreter in the .data section

A bit of digging makes me suspect that its some how related to shapely <2.0 dynamically linking to GEOS. Notably, shapely >2 does work (which we're working on upgrading shapely to avoid this as well).

Other details:

import shapely.geos

print(os.getpid()) while True: time.sleep(1)


* Notably, running with python from apt does work (i.e. from `apt install python3` from `APT-Sources: http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Package`
* [debug_dump.log](https://github.com/user-attachments/files/17441832/debug_dump.log) generated from `RUST_LOG=debug py-spy dump --pid 151614 &> debug_dump.log`

Any ideas if this is something we could fix in py-spy easily?
benfred commented 3 weeks ago

wow - thats really interesting. I just tested this out, and can replicate this on my dev machine. It does seem like we can profile the python-build-standalone interpreter without an issue - but after going import shapely.geos we can't.

Taking a quick look at this, it seems like the import shapely.geos adds a new entry in the /proc/PID/maps virtual memory maps pointing to a new executable section to libpython.so. Like before importing shapely.geos we have only

[2024-11-01T21:33:43.828444634Z DEBUG py_spy::python_spy] map: 00007f83dc641000-00007f83dd31e000 r-x /home/ben/code/python-standalone/python/lib/libpython3.9.so.1.0

but after importing we have

[2024-11-01T21:34:09.723721219Z DEBUG py_spy::python_spy] map: 00007f83dc4c3000-00007f83dc4c4000 r-x /home/ben/code/python-standalone/python/lib/libpython3.9.so.1.0
[2024-11-01T21:34:09.723760971Z DEBUG py_spy::python_spy] map: 00007f83dc641000-00007f83dd31e000 r-x /home/ben/code/python-standalone/python/lib/libpython3.9.so.1.0

The issue seems to be that py-spy ends up trying to load data from the first smaller section that shapely.geos has imported - and should be using the second one.