dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 354 forks source link

How to debug stack overflow errors on Windows #3506

Closed dggmez closed 1 month ago

dggmez commented 1 year ago

Documentation Request

I'm trying to follow the guide about debugging .NET stack oveflow errors:

https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-stackoverflow

The "faulty" test program I wrote is this:

Console.Write("Enter n: ");
var line = Console.ReadLine();

var n = ulong.Parse(line);
var result = Factorial(n);
Console.WriteLine("Factorial({0}) = {1}", n, result);

static ulong Factorial(ulong n)
{
    static ulong Aux(ulong n, ulong acc)
    {
        if (n == 0)
        {
            return acc;
        }
        else
        {
            return Aux(n - 1, acc * n);
        }
    }

    return Aux(n, 1);
}

I'm using .NET on Windows so I thought the only difference would be that I would just need to use Visual Studio or WinDbg instead of lldb so I setup the whole environment variables thing (DOTNET_DbgMiniDumpName and DOTNET_DbgEnableMiniDump) and run the app using dotnet run --configuration Release. I opened the dump file in Visual Studio. No luck. It even say that there was no exception code. I opened the file in WinDbg Preview (the Store version) and I ran an analyze (!analyze -v) and it showed that the exception is 80000003 (Break instruction exception), the faulting source file is D:\a\_work\1\s\src\coreclr\vm\excep.cpp. The good news is that it at least showed me the right name of the method:

image

When I checked the stack backtrace using k it showed me this:

image

So I'm almost 100% sure I'm doing something wrong (I'm new to this debug stack overflow business to be honest) because it looks like the stack trace that I get doesn't show some of the calls I see in that documentation page where lldb is used (like RunMain), it looks like it doesn't show a frame being repeated multiple times and also by the .so extension it shows I guess that small tutorial is done on Linux. It would be great if there was some steps showing how to do the same debugging on Windows using VS or WinDbg for those of us who are using those tools.

Previous documentation

https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-stackoverflow

hoyosjs commented 1 year ago

@dggmez - since your code is managed you need to ask SOS to print the managed stack. For that you can run !clrstack - you can also add -f if you want to see the native frames. src\coreclr\vm\excep.cpp is only marked as faulting since it's collecting the dump. KiUserExceptionDispatch in frame 9 of your example is the call/transition where Windows receives the call that something exceptional happened. I'm surprised the fault the fault is getting reported as a breakpoint exception. !ClrStack -f should show you the frames that transition into the exception call. Or just like the tutorial, you can use !ip2md with the 0x7ffc52e0d5a address to see the name of the function (windbg is a little odd and shows where execution will return, and not the IP running in that function, so when you want to know how you ended up there, you need to grab the IP of the frame on top).

dggmez commented 1 year ago

@hoyosjs I executed !clrstack but it showed nothing:

image

And you are right that by using !ip2md I managed to get the name of the broken C# method (Aux).

I'm surprised the fault the fault is getting reported as a breakpoint exception.

Yeah that's what I get when I execute .exr -1 or !analyze -v.

!ClrStack -f should show you the frames that transition into the exception call.

It seems that because !clrstack is empty (as the image I'm posting in this message shows) then !clrstack -f only returns the native stack that you can get with k.

noahfalk commented 1 month ago

@dggmez - sorry I know this issue has been open a long while and you may not be interested in it any longer, but if you are I wanted to share what I am seeing.

The experience debugging a stack overflow on Windows based on a dump looks quite poor, as you saw. Initial evidence suggests there is a stack unwinding problem but it will need closer investigation.

What I hope you were able to discover is that although the dump experience is bad, the live debugging and console error report are pretty helpful:

Visual Studio live debugging

image

The callstack pane in the debugger also shows the stack overflow: image

Running the app at the console

Enter n: 1000000
Stack overflow.
Repeat 16058 times:
--------------------------------
   at Program.<<Main>$>g__Aux|0_1(UInt64, UInt64)
--------------------------------
   at Program.<<Main>$>g__Factorial|0_0(UInt64)
   at Program.<Main>$(System.String[])

Running the app in windbg

(4d38.2d90): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\bin\Debug\net8.0\StackOverflowApp.dll
StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x65:
00007ff9`096537f5 ff15fd330a00    call    qword ptr [CLRStub[MethodDescPrestub]@00007FF9096F6BF8 (00007ff9`096f6bf8)] ds:00007ff9`096f6bf8={StackOverflowApp!Program.<<Main>$>g__Aux|0_1 (00007ff9`09653790)}

If I print the stack trace at this point it shows the expected recursion:

0:000> k
 # Child-SP          RetAddr               Call Site
00 00000016`56206000 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x65 [C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\Program.cs @ 19] 
01 00000016`56206050 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x6b [C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\Program.cs @ 19] 
02 00000016`562060a0 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x6b [C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\Program.cs @ 19] 
03 00000016`562060f0 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x6b [C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\Program.cs @ 19] 
04 00000016`56206140 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x6b [C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\Program.cs @ 19] 
05 00000016`56206190 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x6b [C:\Users\noahfalk\source\repos\StackOverflowApp\StackOverflowApp\Program.cs @ 19] 
06 00000016`562061e0 00007ff9`096537fb     StackOverflowApp!Program.<<Main>$>g__Aux|0_1+0x6b