benfred / py-spy

Sampling profiler for Python programs
MIT License
12.13k stars 401 forks source link

Re-fetch thread ID in native profiling for Python 3.11 #636

Closed krfricke closed 7 months ago

krfricke commented 7 months ago

Python 3.11 exposes the native thread ID in the thread state.

However, it seems this can be out of sync/stale when a process forked in a native extension. This lead to errors such as:

Process 53014: ...
Python v3.11.5 (/root/.pyenv/versions/3.11.5/bin/python3.11)

Error: UNW_EBADREG: bad register number
Reason: UNW_EBADREG: bad register number

Upon investigation, the native thread ID (which in this case is just the PID) was still pointing to the parent process PID.

The easiest fix here is to just use the existing logic in py-spy to retrieve the thread ID from the OS. This leads to the desired result:

Process 53014: ...
Python v3.11.5 (...)

Thread 53014 (idle): "MainThread"
...

I'm not sure if this is a bug on the Python side - I can see why they wouldn't update/poll a new thread ID after a fork in a native extension - afaik there is no way for Python to tell it's been forked, and it's likely set on init. I haven't checked the CPython source for this though. I might investigate further, but since this fix resolves my problem, won't dive too deep into it.

benfred commented 7 months ago

It seems like you're hitting a bug in cpython https://github.com/python/cpython/issues/100649 - does the fix included there resolve this issue for you?

I'd rather not go back to the previous methods of detecting OS thread ids - being able to directly get the OS threadid from the python interpreter is definitely the way to go here (the previous method doesn't work on ARM for instance).

krfricke commented 7 months ago

The fix is included in Python 3.11.2 (https://docs.python.org/release/3.11.2/whatsnew/changelog.html) but I still run into it in 3.11.5 (as per this issue).

I can try to come up with an easily reproducible example.

Is there a way to force the python interpreter to refresh its thread ID? I could potentially do that after a fork in native code.

benfred commented 7 months ago

Are you calling PyOS_AfterFork_Child in your c-extension? I don't think the fix will get included without that. https://docs.python.org/3/c-api/init.html#cautions-about-fork

Also according to https://docs.python.org/3/library/os.html#os.register_at_fork :

Note that fork() calls made by third-party C code may not call those functions, unless it explicitly calls PyOS_BeforeFork(), PyOS_AfterFork_Parent() and PyOS_AfterFork_Child().

krfricke commented 7 months ago

Apologies for the delay, last week was a bit busy.

It looks like calling PyOS_AfterFork_Child solves this problem - thanks for the help, let's discard this PR then.