JochenKalmbach / StackWalker

Walking the callstack in windows applications
BSD 2-Clause "Simplified" License
831 stars 182 forks source link

Calling StackWalker::ShowCallstack() can cause deadlock #57

Closed marctheisen closed 4 months ago

marctheisen commented 11 months ago

I have successfully integrated the use of a singleton instance of a StackWalker to produce native stacks for a large application that uses a Java VM and native code written in C++. Periodically every 5 minutes a stack trace for all running threads is captured. In some cases this can cause the thread that produces the stack traces to become stuck. The reason is that StackWalker::ShowCallstack() first needs to suspend the (different) thread before it iterates over all its frames to fetch the address of the instruction pointer, then retrieve the corresponding name of the function in the source code, its line number and module information. This approach fails if the suspended thread is current allocating memory or giving allocated memory back to the operating system! The reason is that allocation/deallocation involves the use of a critical section. So if the suspended thread has acquired the critical section it means that other threads including the one executing StackWalker::ShowCallstack() will become stuck when calling malloc() or new. I have changed the implementation of StackWalker::ShowCallstack() as follows. It first allocates memory for a limited number of stack frames (100) to store the DWORD64 of AddrPC.Offset in the STACKFRAME64 struct. Then it iterates over all frames to only store the AddrPC.Offset for all of them. Afterwards the thread is resumed again. Then it iterates over all found frames to collect the remaining information (i.e. name of the function, line number and module information). Finally the OnCallstackEntry() callback function gets called which is now also allowed to perform memory allocations! Note that all error situations are also delayed by calling OnDbgHelpError() after the suspended thread is resumed again. With this version there are no deadlocks anymore. However I am not 100% whether the call to StackWalk64() might perform any memory allocations. I have not found any details on the internet, the documentation could be better.

Let me know whether this project is still maintained and whether you are interested in my code change.

Marc Theisen

marctheisen commented 10 months ago

Meanwhile I have found out that calls to StackWalk64() perform memory allocations as well. Hence it is not safe to suspend a thread and then call StackWalk64() later on to retrieve frame address pointers for it. The final and working solution is to monitor the calls to StackWalk64() in a separate thread. The thread waits for a new thread to be suspended and suspends it. Then it waits using a mutex and a condition variable with a timeout for the thread executing ShowCallstack() to succeed retrieving the frame pointers. If the timeout has been exceeded a 'potentialDeadlock' flag is set to inform the ShowCallstack() thread before resuming the thread again. The ShowCallStack() thread can then re-try for a limited number of attempts. If all attempts fail it returns false. If you are interested in the fully working solution let me know.

JochenKalmbach commented 10 months ago

It is never recommended to suspend threads of the same process. You MUST do this OUT-OF-PROCESS. So you must create a monitoring process which then can safely dump all stacks!

JochenKalmbach commented 10 months ago

See also the documentaion page: https://github.com/JochenKalmbach/StackWalker?tab=readme-ov-file#walking-the-callstack-of-other-threads-in-the-same-process

There is also a demo on how to do this from another process: https://github.com/JochenKalmbach/StackWalker?tab=readme-ov-file#walking-the-callstack-of-other-threads-in-the-same-process

Siehe also: http://blog.kalmbachnet.de/?postid=6

marctheisen commented 10 months ago

I know that suspending threads in the same process can cause deadlocks, see my lengthy comments above. My solution is working flawlessly so far so why not open it for the public? I suggest to at least capture this in ShowCallstack() and throw an exception if it is called with a thread handle that belongs to the current process and if it is not equal to the current thread.

JochenKalmbach commented 10 months ago

I do not see any solution... did you make a pull request?

marctheisen commented 10 months ago

Hi Jochen I am new to GitHub. I have just created a pull request for you.

BTW my change also contains a fix for a rather unlikely issue: When the first call to StackWalk64() failed OnCallstackEntry() was called with type lastEntry and an uninitialized CallstackEntry.

marctheisen commented 10 months ago

Currently only compiles for VC2022. We are using VC2019 were it compiles as well. Further effort would be involved to support it for older versions where the "thread" and "mutex" headers are not available by using CreateThread, EnterCriticalSection, LeaveCriticalSection, SetEvent and WaitForMultipleObjects instead.

JochenKalmbach commented 4 months ago

As said, please use CreateMiniDump method to create callstacks for the current process