KjellKod / g3log

G3log is an asynchronous, "crash safe", logger that is easy to use with default logging sinks or you can add your own. G3log is made with plain C++14 (C++11 support up to release 1.3.2) with no external libraries (except gtest used for unit tests). G3log is made to be cross-platform, currently running on OSX, Windows and several Linux distros. See Readme below for details of usage.
http://github.com/KjellKod/g3log
The Unlicense
908 stars 271 forks source link

g3log - Process hangs "because" of the LogWorker destructor #520

Closed RoyalcraftDev closed 6 months ago

RoyalcraftDev commented 6 months ago

Context

I was trying to fix an issue where my process would stay on even after executing. In short, my app uses methods from a native library I created and obviously uses g3log. Though, I didn't have in mind at all it would be the cause. For long hours, I was trying different things to get rid of the issue, but nothing really worked. I ended up using Process Hacker to find more information about that zombie process. In the process details, it showed one thread left. I naturally checked the stack and saw things related to a wait state. Then after reading further, I saw lines related to g3log, more precisely these lines were related to the destructor of the LogWorker.

Thread State: Wait:WrAlertByThreadId

0, wow64cpu.dll!TurboDispatchJumpAddressEnd+0x544
1, wow64cpu.dll!TurboDispatchJumpAddressEnd+0x421
2, wow64cpu.dll!BTCpuSimulate+0x9
3, wow64.dll!Wow64KiUserCallbackDispatcher+0x4b9
4, wow64.dll!Wow64LdrpInitialize+0x12d
5, ntdll.dll!LdrInitShimEngineDynamic+0x31f1
6, ntdll.dll!LdrInitializeThunk+0x1db
7, ntdll.dll!LdrInitializeThunk+0x63
8, ntdll.dll!LdrInitializeThunk+0xe
9, ntdll.dll!ZwWaitForAlertByThreadId+0xc (No unwind info)
10, KernelBase.dll!SleepConditionVariableSRW+0x23 (No unwind info)
11, msvcp140d.dll!std::_Winerror_message+0x213 (No unwind info)
12, msvcp140d.dll!std::_Winerror_message+0x1d5 (No unwind info)
13, msvcp140d.dll!Cnd_wait+0x32 (No unwind info)
14, libNative.dll!std::condition_variable::wait+0x2c
15, libNative.dll!std::_Associated_state<int>::_Wait+0x76
16, libNative.dll!std::_State_manager<int>::wait+0x5f (No unwind info)
17, libNative.dll!g3::LogWorker::~LogWorker+0x81 (No unwind info)
18, libNative.dll!g3::LogWorker::`scalar deleting destructor'+0x16
19, libNative.dll!std::default_delete<g3::LogWorker>::operator()+0x34
20, libNative.dll!std::unique_ptr<g3::LogWorker,std::default_delete<g3::LogWorker> >::~unique_ptr<g3::LogWorker,std::default_delete<g3::LogWorker> >+0x2b
21, libNative.dll!`ECHO::Log::init'::`2'::`dynamic atexit destructor for 'logWorker''+0xd

Afterwards, I checked in the source code of g3log and looked at the destructor of it and found these lines:

auto bg_clear_sink_call = [this] { _impl._sinks.clear(); };
auto token_cleared = g3::spawn_task(bg_clear_sink_call, _impl._bg.get());
token_cleared.wait();

(My) Solution

I didn't really think too much about it, I just commented the lines, and after recompiling the library, the process was correctly exiting!

But...

I doubt that commenting lines of code like that is very nice, it feels more like a hack than anything else. I'd like to know if there is maybe something that I did wrong that resulted in this behavior or just a bug inside the library.

Additional Context

Platform: Windows 10 Language Version: C++14 Compiler: MSVC

Note: The codebase of the native library is quite old, around 2011 and so does the version of g3log used. After checking I saw files from 2011-2015 of g3log

KjellKod commented 6 months ago
  1. Does it happen for normal exits also or just during crashes?

At g3log exit, it flushes all sinks (default, homemade or through g3sinks) until all messages are taken care of.

  1. Have you observed if messages are still pouring into your log file during the "hang"? If so, that would mean it's trying to empty the queues before exiting

  2. G3log is also slowing down a fatal exit by the same behavior. Technically most actions after a fatal exit signal is in the domain of undefined behavior - it might be defined for a particular system/compiler etc but that's not knowledge handled within g3log. It COULD be this that causes it - it can also be a plethora of other things like improper thread initialization in DLLs, improper forking - thread handling, weird system library linking etc.

  3. Regardless, yes commenting out the wait will lead to faster exit of the system as it's not waiting for the sink to properly shut down. (Could your sink be stuck? Or even be the place where it's crashing?)

Assuming the system has used g3log for a substantial amount of time - it's "unusual" that something like this would surface now unless other things in the system setup changed.

Just my thoughts about it.

RoyalcraftDev commented 6 months ago

This issue happens during a normal exit. When a crash occurs in the DLL it will crash the whole application as well.

I call a native destroy method which shuts down everything inside the native library including the logger. It will not log anything else after I destroyed it.

I want to also note that I have a version which is an executable by itself instead of a native library for tests. So it handles basically all the calls I was doing in my .NET app in C++, here I don't really have issues, when the code reaches the end, it properly exits.

RoyalcraftDev commented 6 months ago

Essentially, a better solution would be to completely free the library out of my process' memory. This was my initial attempt to fix the issue but even if I use the FreeLibrary from kernel32.dll, it doesn't really do much except denying me from using the library unless I call back the LoadLibrary function.

RoyalcraftDev commented 6 months ago

Essentially, a better solution would be to completely free the library out of my process' memory. This was my initial attempt to fix the issue but even if I use the FreeLibrary from kernel32.dll, it doesn't really do much except denying me from using the library unless I call back the LoadLibrary function.

After destroying the Logger in the native code and "releasing" the library, the handle of the log file remains until the process exits. Which could mean that it is in some way still active. Log Handle remains

KjellKod commented 6 months ago
... a version which is an executable by itself instead of a native 
library for tests. So it handles basically all the calls I was doing
 in .NET app in C++, here I don't really have issues, when the code
 reaches the end, it properly exits.

And this version is what was running successfully with g3log, I gather?

And now there's an attempt to change the setup and using dlls which is now experiencing issues?

A Quick Look at closed issues (there's more) show these common culprits for dlls and g3log setup

If you search for closed issues and DLL you can find some common issues with DLLs and Windows when working with g3log

Issue #486 incompatible versions debug release

One thing I’ve seen before is using DLLs in a way that breaks windows standard. Initialization should happpen in the main executable and not in the DLL

Another issue I’ve seen is when a static instantiations that are internal to g3log is attempted to be shared shared across DLL. DLL static initialization and destruction are a mess in Windows and most devs struggle to get this right

Other issues I’ve seen

1) The DLL is owning the g3log instance through a static setup —> Fix: the main program needs to own the G3log instance 2) The G3log (thread) initialization call is handled in the “pre” startup hook routine —> on all platforms (OSX, Linux,Windows) this is unsafe as these pre start routines must never have threaded construct / mutex etc in them 3) The G3log shutdown call or destruction of the instance is handled in the “post” hook routine. Same issue here as in the 2) above 4) DllMain(...) use and threads Windows DLL environment cannot be used at all for setup and breakdown of threaded constructs through DllMain(...)

5) Although not a DLL issue per se, it’s similar in nature. Using forking (I've never done this in Windows but assume is possible) and threaded structures is a recipe for disaster

RoyalcraftDev commented 6 months ago

In fact, I also have an Android version of the native code, compiled as a shared object and made to be ran by an android application and once again, I don't have any issues here either. This case isn't much different I'm reproducing what the android platform does by creating a DLL interface and using C# for actually running the native methods.

So can't really instantiate the logger within the managed code. It might just as well be a platform issue by the fact that I use a DLL instead of an executable though I really feel like this shouldn't happen since a DLL isn't much different from an executable though I might be wrong on this.

KjellKod commented 6 months ago

If I understand you correctly you are instantiating g3log inside the DLL.

How and when and where are you instantiating the logger object?

RoyalcraftDev commented 6 months ago

Initializing the logger is almost the first thing it does. image

KjellKod commented 6 months ago

And init Is called after the DLL is fully loaded, or during the load/initialization of the DLL?By understanding the whole load sequence it would be easier to see if that’s what’s going sideways Sent from my iPhoneOn May 14, 2024, at 06:19, RoyalcraftDev @.***> wrote: Initializing the logger is almost the first thing I do. image.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

ghost commented 6 months ago

I think I understand what's going on.

g3log relies on proper RAII initialization for destruction. Here the logger instance is hidden as a static object within a function, so at process exit the destructor is never called, and therefore the logging sinks aren't flushed.

How static instances are destroyed is AFAIK defined per each specific system.

Some windows specific work around that specifically touched this are mentioned in

https://github.com/KjellKod/g3log/issues/88#issuecomment-219030139 https://github.com/KjellKod/g3log/issues/91

KjellKod commented 6 months ago

Seems like the root cause for this was that initialization and destruction of the logger was not conforming to expected usage. https://kjellkod.github.io/g3log/g3log_usage.html#adding-and-removing-sinks Windows dll work arounds as handled in https://github.com/KjellKod/g3log/issues/88#issuecomment-219030139 and https://github.com/KjellKod/g3log/issues/91 were given.