Open wendajiang opened 1 year ago
What do you mean by cannot output? Please expand. What was the output you expected and what did you end up seeing instead.
Could it be that oneapi is setting its own signal handler?
https://github.com/bombela/backward-cpp/issues/244#issuecomment-1209318978 I think it's likely this problem. tbb other threads continue execute, when one thread receive signal from the kernel, and other threads also receive signal(it also crash by null pointer issue). So program exit directly.
My expected is the normal stack output : However, above code the console output nothing.
Maybe I should write custom signalhandler class to adapt multi threads scenario, at the sig_handler begin, stop all other threads in the process, and handle the signal then.
https://www.man7.org/linux/man-pages/man7/signal.7.html
Signal mask and pending signals Each thread in a process has an independent signal mask, which indicates the set of signals that the thread is currently blocking. A thread can manipulate its signal mask using pthread_sigmask(3). In a traditional single-threaded application, sigprocmask(2) can be used to manipulate the signal mask.
A thread-directed signal is one that is targeted at a specific
thread. A signal may be thread-directed because it was generated
as a consequence of executing a specific machine-language
instruction that triggered a hardware exception (e.g., SIGSEGV
for an invalid memory access, or SIGFPE for a math error), or
because it was targeted at a specific thread using interfaces
such as [tgkill(2)](https://www.man7.org/linux/man-pages/man2/tgkill.2.html) or [pthread_kill(3)](https://www.man7.org/linux/man-pages/man3/pthread_kill.3.html).
A thread can obtain the set of signals that it currently has
pending using [sigpending(2)](https://www.man7.org/linux/man-pages/man2/sigpending.2.html). This set will consist of the union
of the set of pending process-directed signals and the set of
signals pending for the calling thread.
By default threads accept all signals. The library is most likely setting the signal mask per thread.
By default threads accept all signals. The library is most likely setting the signal mask per thread.
I understand this, and try replace the sig_handler by simple system api backtrace and backtrace_symbols , it works. And I gdb the above code, single step one by one, it works.
So I think it's the problem like my comment, when multi thread program receive signal, the kernel arbitrarily selects one thread to deliver, the chosen thread trigger the sig_handler, but other threads continue to execute and crash again, it's not expected behavior.
Finally, I find delete the SA_RESETHAND flag and add one recursive mutex in the sig_handler function, the problem is fixed. Please review the code, if there is better one solution of this problem (multiple threads program crash meanwhile nearly)
Thanks for the code. If I understand correctly, it serializes the execution of the signal handler. In other words, the signal handler can never be executed concurrently on multiple threads, but instead, one by one. In your case, since it aborts after a SIGSEGV, only one will ever execute.
So I think it's the problem like my comment, when multi thread program receive signal, the kernel arbitrarily selects one thread to deliver, the chosen thread trigger the sig_handler, but other threads continue to execute and crash again, it's not expected behavior.
For hardware exception; like SIGSEGV; the documentation states that they are thread-directed. Which means that the signal handler will only execute on the thread that triggered the fault. The kernel doesn't randomly pick a thread here.
You mentioned that multiple threads are segfaulting at the same time. And you say that it works with your code that is serializing all invocations of the signal handler. But it also it works fine if you call backtrace directly. I wonder if the issue is concurrent execution of backward-cpp and the various libraries that it calls.
For hardware exception; like SIGSEGV; the documentation states that they are thread-directed. Which means that the signal handler will only execute on the thread that triggered the fault. The kernel doesn't randomly pick a thread here.
But the strange result is, if using std::mutex , the deadlock happens.
For hardware exception; like SIGSEGV; the documentation states that they are thread-directed. Which means that the signal handler will only execute on the thread that triggered the fault. The kernel doesn't randomly pick a thread here.
But the strange result is, if using std::mutex , the deadlock happens.
Sorry, it's my trying code logic error, only add std::mutex, it works correctly.
Thanks for the code. If I understand correctly, it serializes the execution of the signal handler. In other words, the signal handler can never be executed concurrently on multiple threads, but instead, one by one. In your case, since it aborts after a SIGSEGV, only one will ever execute.
PS. Deleting the SA_RESETHAND flag is also important, as multiple threads crash would trigger the sig_handler not the default core dump.
And raise(sigNo)
inside the sig_handler function should be deleted, for avoiding the infinitely signal handle.
Got it, thank you for investigating. I will have to spend some time on this.
Maybe deleting the SA_RESETHAND is awful, with using recursive_mutex it cause re-call signal_handler, plus using jemalloc the crash report dead lock.
Also not working for me with OpenMP, might be related.
My sample code is like this:
Look at the code
char *nn = nullptr; nn[3] = 'a';
And the backward-cpp can not output the stack.