Closed krfricke closed 7 months ago
It seems like you're hitting a bug in cpython https://github.com/python/cpython/issues/100649 - does the fix included there resolve this issue for you?
I'd rather not go back to the previous methods of detecting OS thread ids - being able to directly get the OS threadid from the python interpreter is definitely the way to go here (the previous method doesn't work on ARM for instance).
The fix is included in Python 3.11.2 (https://docs.python.org/release/3.11.2/whatsnew/changelog.html) but I still run into it in 3.11.5 (as per this issue).
I can try to come up with an easily reproducible example.
Is there a way to force the python interpreter to refresh its thread ID? I could potentially do that after a fork in native code.
Are you calling PyOS_AfterFork_Child
in your c-extension? I don't think the fix will get included without that. https://docs.python.org/3/c-api/init.html#cautions-about-fork
Also according to https://docs.python.org/3/library/os.html#os.register_at_fork :
Note that fork() calls made by third-party C code may not call those functions, unless it explicitly calls PyOS_BeforeFork(), PyOS_AfterFork_Parent() and PyOS_AfterFork_Child().
Apologies for the delay, last week was a bit busy.
It looks like calling PyOS_AfterFork_Child
solves this problem - thanks for the help, let's discard this PR then.
Python 3.11 exposes the native thread ID in the thread state.
However, it seems this can be out of sync/stale when a process forked in a native extension. This lead to errors such as:
Upon investigation, the native thread ID (which in this case is just the PID) was still pointing to the parent process PID.
The easiest fix here is to just use the existing logic in py-spy to retrieve the thread ID from the OS. This leads to the desired result:
I'm not sure if this is a bug on the Python side - I can see why they wouldn't update/poll a new thread ID after a fork in a native extension - afaik there is no way for Python to tell it's been forked, and it's likely set on init. I haven't checked the CPython source for this though. I might investigate further, but since this fix resolves my problem, won't dive too deep into it.