Crash handler hangs - Githubissues

KjellKod / g3log

G3log is an asynchronous, "crash safe", logger that is easy to use with default logging sinks or you can add your own. G3log is made with plain C++14 (C++11 support up to release 1.3.2) with no external libraries (except gtest used for unit tests). G3log is made to be cross-platform, currently running on OSX, Windows and several Linux distros. See Readme below for details of usage.

http://github.com/KjellKod/g3log

The Unlicense

908 stars 271 forks source link

Crash handler hangs #480

Closed GergoTot closed 1 year ago

GergoTot commented 1 year ago

Hi,

I'm running my application under Linux in Docker container with PID 1. When there is an abort and SIGABRT signal is dropped (I could also reproduce it with std::abort) there is a while cycle in signalHandler which hangs for me: // Only one signal will be allowed past this point if (false == shouldDoExit()) { while (true) { std::this_thread::sleep_for(std::chrono::seconds(1)); } }

The reemitting SIGABRT signal with kill in exitWithDefaultSignalHandler funtion after restoring the Linux default signal handler for SIGABRT (restoreSignalHandler(signal_number)) will not end such never ending while loops since the kill for PID1 can not end this while loop. The application is pending. My question is the following. Could it be an alternative solution in signalhandler something similar to this: waiting in signalhandler for fatal logging and flushing logs on other threads and after all signalhandler would continue to run, would emit the kill signal again and would call the exit. This could prevent to return from signalhandler which can cause other crashes and never ending while loops could be also avoided...

KjellKod commented 1 year ago

If you do searches for PID1 in issues you will see some useful information

Particularly this I think can be used: https://github.com/KjellKod/g3log/issues/269. This is the suggestion I've had so far regarding this particular issue: https://github.com/KjellKod/g3log/blob/master/docs/API.md#pid1-fatal-signal-recommendations

KjellKod commented 1 year ago

If you or someone else in the community wants to move forward with a change for this scenario the #269 has the information needed. It just needs to make sure it also works on OSX, Windows, Linux in Docker and Off-Docker setups.

KjellKod commented 1 year ago

@hoditohod thanks for the great explanation in #269 btw.

GergoTot commented 1 year ago

Hi, thanks for your reply. Now we try to use own patch to handle this SIGABRT scenario in PID 1 scenario. Unfortunatelly SIGABRT caused circural crashes with another signal SIGSEGV. So we have to restore the original Linux signal handling for all of the signals and not only for SIGABRT before exit in our patch. Do you see any problems with restoring the original Linux signal handling for all of the signals directly before exiting due to the original SIGABRT signal?

KjellKod commented 1 year ago

Restoring the original sounds good to me

KjellKod commented 1 year ago

Circular crashes can also be detected by the code by setting a flag and checking that flag.

When doing custom signal handler work that's an approach I've used in the past

GergoTot commented 1 year ago

Finally I have perceived our previous PR: https://github.com/KjellKod/g3log/pull/419 Thank you very much that you have already merged it. With hoditohod we are from a same working place (unfortunatelly he is leaving us now). I have also opened a commit about this current infinte loop situation: https://github.com/KjellKod/g3log/pull/481/commits/2f18c5b7ed7a3d9bb54f91f2e7bc283ccaa04446 What do you think about this enhancement? With it we could handled SIGABRT situation also in Docker container with PID 1. Logs and backtrace were written, service was exited as expected and core dump was also generated. Aborting our application without Docker container were also tested successful: backtrace, logs, exitting and core dump were right. As i wrote in the commit our original issue was the following more accurately: Our service (running in Docker and PID 1) was crashed with SIGABRT signal. After SIGABRT dropped then unfortunatlly infinite SIGSEGV signals were also started to drop. So the infinite loop stucked since the kill signal doesn't stop the infinite loop when running in Docker container with PID 1. We used the similar solution mentioned this PR: https://github.com/KjellKod/g3log/pull/419. We also had to restore the saved signal handlers. Without it infinte SIGSEGV signals were dropped circully and this situation also caused pending when running in Docker container with PID 1.

KjellKod commented 1 year ago

That commit change looks great. Please put up a pull request and we'll put it under test

KjellKod commented 1 year ago

You are from Hungary huh? When bowfishing I try to make Halászlé, a Hungarian favorite of mine :)

GergoTot commented 1 year ago

Thank you and I prepared the proposed solution in PR: https://github.com/KjellKod/g3log/pull/481 Yes, we are from Hungary. Bowfishing sounds very exciting, now I have understood your avatar:). I generally fishing in river Tisa and also in river Danube. They are beatiful places with great fish, and indeed very delicious Halászlé can be cooked here :)

KjellKod commented 1 year ago

Thanks for the fix, it's now merged: https://github.com/KjellKod/g3log/pull/481

GergoTot commented 1 year ago

Thank you!

GergoTot commented 1 year ago

When do you plan your next release?

KjellKod commented 1 year ago

I just made a release. I think this is the first thing that is in since then

If it’s important I can make a minor release otherwise I’ll probably be a few months up to 6 months

GergoTot commented 1 year ago

We do appreciate if you could make minor release. It would help us a lot. Could you make a minor release?

KjellKod commented 1 year ago

It's on my todo list. I'll see if I can address the other issue that just came up also in the same release. if it's super urgent I can do it faster but prefer to wait if I can.

GergoTot commented 1 year ago

It's ok later, thank you.