Support stack traces of all threads

pmalhaire commented 2 years ago

Hello,

Your tool is the best we could find for getting the stack trace of one thread, is there a way, even partial, to get the stack traces of all threads ?

This will be for us the feature that will make this project the best tool above all. I am willing to help for the Linux part. It's lightweight, easy to implement, and the code is readable, thank you for this awsome project.

bkietz commented 1 year ago

@bombela I wrote https://gist.github.com/bkietz/9e72ff72d58c6f9d977845c39fd63a21 as an example of how one could accomplish this under pthreads. I'm not sure if there's a more simultaneous/less hacky way to get all threads to stop; gdb seems to use signals to the non-segfaulted threads in a similar fashion. Is this an implementation you'd like a PR for? If so, any guidance on how you would like it structured in backward::?

bkietz commented 1 year ago

OTOH, having written that I think the correct solution to the all-threads-trace problem is probably to allow the process to core dump then reading stacks out of that. This has two advantages over in-process tracing:

When a signal handler exists, the non-signaled threads continue execution until they receive signals of their own. However if a signal is known to be fatal, the OS can shut threads down more aggressively- this means it is possible to can get less out-of-date traces from the threads which didn't segfault than would be possible with interthread signals
We'd probably be reading the core dump with gdb or another debugger and we'd have access to the process' full memory, so we could print not just snippets of the source files but values of local variables as well

So maybe this should be closed as out of scope for backward.

pmalhaire commented 1 year ago

@bkietz sure it's close to out of scope feature, but if it could be done in a clean maner even perhaps using a complementary repo it would be a killer feature.

bombela commented 1 year ago

I don't mind the addition of a "pthread.hpp" file or something like that. And overtime it could even morph into a cross platform solution "threads.hpp".

You would include this extra file in your project only if you want it.

bombela commented 1 year ago

As for the proposed implementation @bkietz, it must possible to enumerate the threads via the OS. After all, ps and top can do it!

bkietz commented 1 year ago

It's definitely possible; GDB does it by reading procfs. I intentionally avoided this because doing so adds more syscall delay between the first signal and signalling the other threads, which degrades the quality of the traces from other threads. A manual table of pthread_t is more work but lets you get straight to signaling. If the boilerplate is intolerable we could read procfs instead (unless you know of a faster way to enumerate threads?).

bombela commented 1 year ago

The manual table also limits you to the threads that you control directly. So threads created by a library won't be visible unless the library happens to also use the same registry (including the same ABI).

Threads that are short lived cannot be registered. Because after the thread terminates. The thread ID could be reused as the pthread documentation describes.

So threads that terminate should be unregistered.

I don't think there is any other way than procfs for listing them all. And that wouldn't be atomic and too slow as you said.

So... a registry with some ways to register in start and deregister on thread termination seems to be the way to go.

I guess its always possible to override pthread_create and wrap the function to execute with a push/pop cleanup to deregister on termination. See https://linux.die.net/man/3/pthread_cleanup_push

bkietz commented 1 year ago

Yet another option would be to look even more like gdb: provide a helper to be executed (very) early in main() which calls vfork and ptrace. The tracing process watches for new/exiting threads (potentially child processes too, needs more thought) and maintains the listing of what needs a tgkill. This introduces some overhead due to context switching into the tracing process when the initial signal is received and couldn't be used at the same time as a debugger (since only one process may ptrace another). Also would require some doc for use in containers since some (docker at least) forbid ptrace calls by default. Still, seemed worth mentioning on the strength of "one line addition for consumers"

bombela / backward-cpp

Support stack traces of all threads #244