Open GoogleCodeExporter opened 9 years ago
Unfortunately, pthread_getname_np() isn't async-safe; it acquires the global
_pthread_list_lock within libSystem.
Additionally -- and this is much less of a concern than async-safety -- the
pthread API can only work in-process, whereas the Mac OS X reporter is moving
to an out-of-process model. Ideally I'd like to avoid functionality that relies
on in-process execution.
There are a few options here, none exactly pretty.
For dispatch queue names, there is a supported API for fetching the dispatch
queue address for a given thread via thread_info(THREAD_IDENTIFIER_INFO, ...).
From there, one needs to derive the offset to the actual queue label within the
structure. The "supported" way to do that (or at least, the way Apple's
debuggers do it) is by searching for the private 'dispatch_queue_offsets'
symbol in libSystem and libdispatch. This references a versioned
dispatch_queue_offsets structure that's intended to remain binary compatibility
across releases, but is still private API, and may not. The
dispatch_queue_offsets structure provides the offset and size of the label
field within the queue structure.
This is pretty far into unsupported territory. It's possible to do the above
safely through the use of the safe memory dereference/mapping APIs, if 'safely'
is defined as not crashing. It's still possible we could fetch bogus data, or
trigger private API rejections from Apple (especially in referencing the
'dispatch_queue_offsets' symbol.
For thread names, it's a bit less tricky. There's a private __proc_info syscall
which *is* async-safe, but is not documented to be, and is not public. The
libproc proc_pidinfo() API is implemented directly on top of that syscall. The
libproc API is public, and is also async-safe, but it's not actually
*documented* as being async-safe, and there's nothing inherent in its design
that would require it to be. This might be fodder for a radar. The supported
way to fetch the thread name from the kernel is via the PROC_PIDTHREADINFO
proc_pidinfo variant, passing in the thread handle as returned by
thread_info(THREAD_IDENTIFIER_INFO, ...).
The async-safety/private api/etc 'gotchas' here are the only reason I've
hesitated to support thread/queue names, though I'm not necessarily opposed.
The only non-starter would be async-safety issues, as that can result in a
straight deadlock, rather than just having partially correct data in the report
(eg, in the case fields/APIs change). Perhaps we need to provide a 'safe' mode
and a 'featureful' mode to allow integrators to decide what risks they're
comfortable taking; some things might be worth doing for development builds,
but you may not want to ship those to the app store.
Original comment by landon.j.fuller@gmail.com
on 3 May 2013 at 3:31
Thanks for the feedback on the use of pthread_getname_np(), let me see what I
can do with proc_pidinfo(..) and friends and come up with an alternative
implementation for the thread name part. It sounds like unless there is a way
of doing a "try lock" or similar, dispatch queue names are out of reach at
present.
Original comment by ve6yeq
on 4 May 2013 at 4:49
Hello. Just curious, when we have dispatch_queue address why cannot we just use
dispatch_queue_get_label(dispatch_queue) function to get queue name?
Original comment by dude.fro...@gmail.com
on 15 Jan 2014 at 5:30
dispatch_queue_get_label() provides no async-safety guarantees. It happens to
be the case that it is async-safe, but there's no requirement that it remain
that way.
Supporting thread / queue names is still on the roadmap, but have remained a
low priority relative to ARM64 support, eh_frame/compact_unwind, etc. They'd be
nice to have for completeness, but given that their utility is *comparatively*
low and any implementation that fetched them would have to rely on private
API/data structures, or on non-async-safe public API, their addition hasn't
been prioritized.
We have a lot more async-safe tooling to work with these days, so it may be the
case that we revisit this question later, possibly in light of plans to break
the crash reporting process into atomic chunks that allow us to place more
risky operations later in the process.
Original comment by landon.j.fuller@gmail.com
on 15 Jan 2014 at 5:44
Thanks for answer, one more question.
As I can see from thread_info sources - it takes at least 2 internal locks -
one thread_mtx_lock(thread);
directly in thread_info and other one in thread_info_internal
thread_lock(thread);
So, is it safe to use this function in a crash-reporter? What going to happen
if app crashed while holding a lock for this thread?
Original comment by dude.fro...@gmail.com
on 17 Jan 2014 at 6:38
In the case of thread_info(), the locks are taken kernel-side; if a thread is
executing within a interrupt/trap (eg, such as a syscall), the kernel registers
a special AST (asynchronous system trap) return handler that suspends the
thread prior to returning from the kernel. If you'd like to poke at the
implementation, take a look at
http://fxr.watson.org/fxr/source/osfmk/kern/ast.c?v=xnu-2050.18.24#L172 and the
thread_sleep() code path.
The end result is that kernel locks are not an issue -- short of kernel bugs,
which do exist, especially in the areas of mach exception handling, where it
can be surprisingly easy to deadlock the kernel.
Most of the Mach APIs /must/ be userspace async-safe for them to meet their
defined invariants; APIs such as thread_suspend/thread_resume must operate even
in the face of existing suspended threads, and once you've introduced the idea
of thread suspension as a supported operation, all other APIs at that level
implicitly must be able to operate in an environment where a thread has been
suspended, unless documented otherwise.
There does exist the possibility of regressions in this space, or a less
rigorous (some might say less preferential) interpretation by Apple kernel
team: While it would *always* be a kernel bug to be able to deadlock the kernel
from userspace, there exists the possibility that userspace wrappers on the
syscalls could themselves include local userspace locking. I'd argue that this
is outside the bounds of the invariants of the Mach APIs themselves, and it
could significantly complicate life for debuggers, stop-the-world garbage
collectors, etc. Given the implicit invariants and the decades of consistency,
I don't expect to see this change, outside of an intentional effort by Apple to
deprecate external use of the Mach APIs, as they've occasionally threatened to
do (without actually providing suitable replacements) since the release of Mac
OS X 10.0.
Lastly, we've migrated bug tracking over to https://opensource.plausible.coop;
this bug can be found at https://opensource.plausible.coop/bugs/browse/PLCR-462.
Unfortunately, since Google eliminated their Google Code APIs, we were unable
to automatically update all these bugs with references, and there's no way to
issue a redirect for the direct bug URLs.
Original comment by landon.j.fuller@gmail.com
on 17 Jan 2014 at 4:01
Is this query related to your issue here
https://github.com/kstenerud/KSCrash/pull/51? While I'd love to help with a
patch that implements this feature correctly, we don't have the resources to
provide technical advice on implementation issues for an unrelated project, and
this isn't really the forum for general technical queries.
Original comment by landon.j.fuller@gmail.com
on 17 Jan 2014 at 4:17
Original issue reported on code.google.com by
ve6yeq
on 3 May 2013 at 5:11Attachments: