KeshaviPhone / plcrashreporter

Automatically exported from code.google.com/p/plcrashreporter
Other
0 stars 0 forks source link

[PATCH] Add support for thread names #65

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Add support for saving thread names on the client and then formatting them as 
per Xcode 4.6 and iOS 6.*

Original issue reported on code.google.com by ve6yeq on 3 May 2013 at 5:11

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately, pthread_getname_np() isn't async-safe; it acquires the global 
_pthread_list_lock within libSystem.

Additionally -- and this is much less of a concern than async-safety -- the 
pthread API can only work in-process, whereas the Mac OS X reporter is moving 
to an out-of-process model. Ideally I'd like to avoid functionality that relies 
on in-process execution.

There are a few options here, none exactly pretty.

For dispatch queue names, there is a supported API for fetching the dispatch 
queue address for a given thread via thread_info(THREAD_IDENTIFIER_INFO, ...). 
From there, one needs to derive the offset to the actual queue label within the 
structure. The "supported" way to do that (or at least, the way Apple's 
debuggers do it) is by searching for the private 'dispatch_queue_offsets' 
symbol in libSystem and libdispatch. This references a versioned 
dispatch_queue_offsets structure that's intended to remain binary compatibility 
across releases, but is still private API, and may not.  The 
dispatch_queue_offsets structure provides the offset and size of the label 
field within the queue structure.

This is pretty far into unsupported territory. It's possible to do the above 
safely through the use of the safe memory dereference/mapping APIs, if 'safely' 
is defined as not crashing. It's still possible we could fetch bogus data, or 
trigger private API rejections from Apple (especially in referencing the 
'dispatch_queue_offsets' symbol.

For thread names, it's a bit less tricky. There's a private __proc_info syscall 
which *is* async-safe, but is not documented to be, and is not public. The 
libproc proc_pidinfo() API is implemented directly on top of that syscall. The 
libproc API is public, and is also async-safe, but it's not actually 
*documented* as being async-safe, and there's nothing inherent in its design 
that would require it to be. This might be fodder for a radar. The supported 
way to fetch the thread name from the kernel is via the PROC_PIDTHREADINFO 
proc_pidinfo variant, passing in the thread handle as returned by 
thread_info(THREAD_IDENTIFIER_INFO, ...).

The async-safety/private api/etc 'gotchas' here are the only reason I've 
hesitated to support thread/queue names, though I'm not necessarily opposed. 
The only non-starter would be async-safety issues, as that can result in a 
straight deadlock, rather than just having partially correct data in the report 
(eg, in the case fields/APIs change). Perhaps we need to provide a 'safe' mode 
and a 'featureful' mode to allow integrators to decide what risks they're 
comfortable taking; some things might be worth doing for development builds, 
but you may not want to ship those to the app store.

Original comment by landon.j.fuller@gmail.com on 3 May 2013 at 3:31

GoogleCodeExporter commented 9 years ago
Thanks for the feedback on the use of pthread_getname_np(), let me see what I 
can do with proc_pidinfo(..) and friends and come up with an alternative 
implementation for the thread name part. It sounds like unless there is a way 
of doing a "try lock" or similar, dispatch queue names are out of reach at 
present.

Original comment by ve6yeq on 4 May 2013 at 4:49

GoogleCodeExporter commented 9 years ago
Hello. Just curious, when we have dispatch_queue address why cannot we just use 
dispatch_queue_get_label(dispatch_queue) function to get queue name?

Original comment by dude.fro...@gmail.com on 15 Jan 2014 at 5:30

GoogleCodeExporter commented 9 years ago
dispatch_queue_get_label() provides no async-safety guarantees. It happens to 
be the case that it is async-safe, but there's no requirement that it remain 
that way.

Supporting thread / queue names is still on the roadmap, but have remained a 
low priority relative to ARM64 support, eh_frame/compact_unwind, etc. They'd be 
nice to have for completeness, but given that their utility is *comparatively* 
low and any implementation that fetched them would have to rely on private 
API/data structures, or on non-async-safe public API, their addition hasn't 
been prioritized.

We have a lot more async-safe tooling to work with these days, so it may be the 
case that we revisit this question later, possibly in light of plans to break 
the crash reporting process into atomic chunks that allow us to place more 
risky operations later in the process.

Original comment by landon.j.fuller@gmail.com on 15 Jan 2014 at 5:44

GoogleCodeExporter commented 9 years ago
Thanks for answer, one more question.
As I can see from thread_info sources - it takes at least 2 internal locks - 
one thread_mtx_lock(thread);
directly in thread_info and other one in thread_info_internal
thread_lock(thread);
So, is it safe to use this function in a crash-reporter? What going to happen 
if app crashed while holding a lock for this thread?

Original comment by dude.fro...@gmail.com on 17 Jan 2014 at 6:38

GoogleCodeExporter commented 9 years ago
In the case of thread_info(), the locks are taken kernel-side; if a thread is 
executing within a interrupt/trap (eg, such as a syscall), the kernel registers 
a special AST (asynchronous system trap) return handler that suspends the 
thread prior to returning from the kernel. If you'd like to poke at the 
implementation, take a look at 
http://fxr.watson.org/fxr/source/osfmk/kern/ast.c?v=xnu-2050.18.24#L172 and the 
thread_sleep() code path.

The end result is that kernel locks are not an issue -- short of kernel bugs, 
which do exist, especially in the areas of mach exception handling, where it 
can be surprisingly easy to deadlock the kernel.

Most of the Mach APIs /must/ be userspace async-safe for them to meet their 
defined invariants; APIs such as thread_suspend/thread_resume must operate even 
in the face of existing suspended threads, and once you've introduced the idea 
of thread suspension as a supported operation, all other APIs at that level 
implicitly must be able to operate in an environment where a thread has been 
suspended, unless documented otherwise.

There does exist the possibility of regressions in this space, or a less 
rigorous (some might say less preferential) interpretation by Apple kernel 
team: While it would *always* be a kernel bug to be able to deadlock the kernel 
from userspace, there exists the possibility that userspace wrappers on the 
syscalls could themselves include local userspace locking. I'd argue that this 
is outside the bounds of the invariants of the Mach APIs themselves, and it 
could significantly complicate life for debuggers, stop-the-world garbage 
collectors, etc. Given the implicit invariants and the decades of consistency, 
I don't expect to see this change, outside of an intentional effort by Apple to 
deprecate external use of the Mach APIs, as they've occasionally threatened to 
do (without actually providing suitable replacements) since the release of Mac 
OS X 10.0.

Lastly, we've migrated bug tracking over to https://opensource.plausible.coop; 
this bug can be found at https://opensource.plausible.coop/bugs/browse/PLCR-462.

Unfortunately, since Google eliminated their Google Code APIs, we were unable 
to automatically update all these bugs with references, and there's no way to 
issue a redirect for the direct bug URLs.

Original comment by landon.j.fuller@gmail.com on 17 Jan 2014 at 4:01

GoogleCodeExporter commented 9 years ago
Is this query related to your issue here 
https://github.com/kstenerud/KSCrash/pull/51?  While I'd love to help with a 
patch that implements this feature correctly, we don't have the resources to 
provide technical advice on implementation issues for an unrelated project, and 
this isn't really the forum for general technical queries.

Original comment by landon.j.fuller@gmail.com on 17 Jan 2014 at 4:17