Closed GergoTot closed 1 year ago
If you do searches for PID1 in issues you will see some useful information
Particularly this I think can be used: https://github.com/KjellKod/g3log/issues/269. This is the suggestion I've had so far regarding this particular issue: https://github.com/KjellKod/g3log/blob/master/docs/API.md#pid1-fatal-signal-recommendations
If you or someone else in the community wants to move forward with a change for this scenario the #269 has the information needed. It just needs to make sure it also works on OSX, Windows, Linux in Docker and Off-Docker setups.
@hoditohod thanks for the great explanation in #269 btw.
Hi, thanks for your reply. Now we try to use own patch to handle this SIGABRT scenario in PID 1 scenario. Unfortunatelly SIGABRT caused circural crashes with another signal SIGSEGV. So we have to restore the original Linux signal handling for all of the signals and not only for SIGABRT before exit in our patch. Do you see any problems with restoring the original Linux signal handling for all of the signals directly before exiting due to the original SIGABRT signal?
Restoring the original sounds good to me
Circular crashes can also be detected by the code by setting a flag and checking that flag.
When doing custom signal handler work that's an approach I've used in the past
Finally I have perceived our previous PR: https://github.com/KjellKod/g3log/pull/419 Thank you very much that you have already merged it. With hoditohod we are from a same working place (unfortunatelly he is leaving us now). I have also opened a commit about this current infinte loop situation: https://github.com/KjellKod/g3log/pull/481/commits/2f18c5b7ed7a3d9bb54f91f2e7bc283ccaa04446 What do you think about this enhancement? With it we could handled SIGABRT situation also in Docker container with PID 1. Logs and backtrace were written, service was exited as expected and core dump was also generated. Aborting our application without Docker container were also tested successful: backtrace, logs, exitting and core dump were right. As i wrote in the commit our original issue was the following more accurately: Our service (running in Docker and PID 1) was crashed with SIGABRT signal. After SIGABRT dropped then unfortunatlly infinite SIGSEGV signals were also started to drop. So the infinite loop stucked since the kill signal doesn't stop the infinite loop when running in Docker container with PID 1. We used the similar solution mentioned this PR: https://github.com/KjellKod/g3log/pull/419. We also had to restore the saved signal handlers. Without it infinte SIGSEGV signals were dropped circully and this situation also caused pending when running in Docker container with PID 1.
That commit change looks great. Please put up a pull request and we'll put it under test
You are from Hungary huh? When bowfishing I try to make Halászlé, a Hungarian favorite of mine :)
Thank you and I prepared the proposed solution in PR: https://github.com/KjellKod/g3log/pull/481 Yes, we are from Hungary. Bowfishing sounds very exciting, now I have understood your avatar:). I generally fishing in river Tisa and also in river Danube. They are beatiful places with great fish, and indeed very delicious Halászlé can be cooked here :)
Thanks for the fix, it's now merged: https://github.com/KjellKod/g3log/pull/481
Thank you!
When do you plan your next release?
I just made a release. I think this is the first thing that is in since then
If it’s important I can make a minor release otherwise I’ll probably be a few months up to 6 months
We do appreciate if you could make minor release. It would help us a lot. Could you make a minor release?
It's on my todo list. I'll see if I can address the other issue that just came up also in the same release. if it's super urgent I can do it faster but prefer to wait if I can.
It's ok later, thank you.
Hi,
I'm running my application under Linux in Docker container with PID 1. When there is an abort and SIGABRT signal is dropped (I could also reproduce it with std::abort) there is a while cycle in signalHandler which hangs for me: // Only one signal will be allowed past this point if (false == shouldDoExit()) { while (true) { std::this_thread::sleep_for(std::chrono::seconds(1)); } }
The reemitting SIGABRT signal with kill in exitWithDefaultSignalHandler funtion after restoring the Linux default signal handler for SIGABRT (restoreSignalHandler(signal_number)) will not end such never ending while loops since the kill for PID1 can not end this while loop. The application is pending. My question is the following. Could it be an alternative solution in signalhandler something similar to this: waiting in signalhandler for fatal logging and flushing logs on other threads and after all signalhandler would continue to run, would emit the kill signal again and would call the exit. This could prevent to return from signalhandler which can cause other crashes and never ending while loops could be also avoided...