dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 355 forks source link

LLDB/SOS clrstack command does not work on mach-o coredump #2443

Closed kdubau closed 3 years ago

kdubau commented 3 years ago

Description

When you load up a coredump generated on macOS (by using COMPlus_DbgEnableMiniDump=1) in LLDB, most of the SOS commands work, but clrstack doesn't and prints below message (even when it definitely is in a managed thread context):

(lldb) clrstack
OS Thread Id: 0x0 (1)
Unable to walk the managed stack. The current thread is likely not a
managed thread. You can run clrthreads to get a list of managed threads in
the process
Failed to start stack walk: 80070057

This works as expected if you cause the same crash on Windows and load it up in WinDbg.

I've generated a dump using this testdump tool https://github.com/xamarin/macson/tree/kywhi/dev/net6/src/Tests/testdump

kylewhite@kywhi-mbp ~/Workspace/macson/src/Tests/testdump (kywhi/dev/net6) $ testdump exception --thread background
Process ID: 36909
About to throw CrashyException...
Crashy Background Thread
Unhandled exception. testdump.CrashyException: It's a crash!
   at testdump.CrashyClass.<>c.<Exception>b__1_0() in /Users/kylewhite/Workspace/macson/src/Tests/testdump/CrashyClass.cs:line 19
   at testdump.CrashyClass.<>c__DisplayClass13_0.<RunActionInThread>b__0() in /Users/kylewhite/Workspace/macson/src/Tests/testdump/CrashyClass.cs:line 97
   at System.Threading.Thread.StartCallback() in System.Private.CoreLib.dll:token 0x600272e+0xe
Gathering state for process 36909
Writing minidump with heap to file /Users/kylewhite/dumps/coredump.36909
Written 853934264 bytes (208480 pages) to core file
Dump successfully written
Abort trap: 6 (core dumped)

kylewhite@kywhi-mbp ~/Workspace/macson/src/Tests/testdump (kywhi/dev/net6) $ lldb -c /Users/kylewhite/dumps/coredump.36909
Added Microsoft public symbol server
(lldb) target create --core "/Users/kylewhite/dumps/coredump.36909"
Core file '/Users/kylewhite/dumps/coredump.36909' (x86_64) was loaded.

(lldb) clrthreads
ThreadCount:      4
UnstartedThread:  0
BackgroundThread: 3
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                            Lock
 DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
XXXX    1   a07b1d 00007FF5F6810400  2020020 Preemptive  00000001848014F0:0000000184801FD0 00007FF5E680BC00 -00001 Ukn
XXXX    2   a07b27 00007FF5F6811C00    21220 Preemptive  0000000000000000:0000000000000000 00007FF5E680BC00 -00001 Ukn (Finalizer)
XXXX    3   a07b29 00007FF5E681B600    21220 Preemptive  0000000000000000:0000000000000000 00007FF5E680BC00 -00001 Ukn
XXXX    4   a07b2c 00007FF5F683D600    21220 Preemptive  000000018480C578:000000018480DFD0 00007FF5E680BC00 -00001 Ukn testdump.CrashyException 0000000184802080

(lldb) thread list
Process 0 stopped
  thread #1: tid = 0x0000, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #2: tid = 0x0001, 0x00007fff205df2ba libsystem_kernel.dylib`mach_msg_trap + 10, stop reason = signal SIGSTOP
  thread #3: tid = 0x0002, 0x00007fff205e59ca libsystem_kernel.dylib`poll + 10, stop reason = signal SIGSTOP
  thread #4: tid = 0x0003, 0x00007fff205e59ca libsystem_kernel.dylib`poll + 10, stop reason = signal SIGSTOP
  thread #5: tid = 0x0004, 0x00007fff205dfb52 libsystem_kernel.dylib`__open + 10, stop reason = signal SIGSTOP
  thread #6: tid = 0x0005, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #7: tid = 0x0006, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #8: tid = 0x0007, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #9: tid = 0x0008, 0x00007fff205dfcce libsystem_kernel.dylib`read + 10, stop reason = signal SIGSTOP
* thread #10: tid = 0x0009, 0x00007fff205e4a8a libsystem_kernel.dylib`__wait4 + 10, stop reason = signal SIGSTOP

(lldb) bt
* thread #10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff205e4a8a libsystem_kernel.dylib`__wait4 + 10
    frame #1: 0x0000000104818366 libcoreclr.dylib`PROCCreateCrashDumpIfEnabled + 86
    frame #2: 0x000000010481611b libcoreclr.dylib`PROCAbort + 27
    frame #3: 0x000000010481606e libcoreclr.dylib`PROCEndProcess(void*, unsigned int, int) + 222
    frame #4: 0x0000000104a64ed5 libcoreclr.dylib`UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 1029
    frame #5: 0x0000000104a64f73 libcoreclr.dylib`DispatchManagedException(PAL_SEHException&, bool) + 67
    frame #6: 0x00000001049c4c6d libcoreclr.dylib`IL_Throw(Object*) + 557
    frame #7: 0x000000011bd1204d
    frame #8: 0x000000011bd11fa4
    frame #9: 0x000000011ae7fece
    frame #10: 0x0000000104b003c9 libcoreclr.dylib`CallDescrWorkerInternal + 124
    frame #11: 0x000000010495239f libcoreclr.dylib`DispatchCallSimple(unsigned long*, unsigned int, unsigned long, unsigned int) + 239
    frame #12: 0x000000010496a208 libcoreclr.dylib`ThreadNative::KickOffThread_Worker(void*) + 136
    frame #13: 0x0000000104919b1e libcoreclr.dylib`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 318
    frame #14: 0x000000010491a0c0 libcoreclr.dylib`ManagedThreadBase::KickOff(void (*)(void*), void*) + 32
    frame #15: 0x000000010496a2df libcoreclr.dylib`ThreadNative::KickOffThread(void*) + 175
    frame #16: 0x0000000104819d77 libcoreclr.dylib`CorUnix::CPalThread::ThreadEntry(void*) + 407
    frame #17: 0x00007fff206148fc libsystem_pthread.dylib`_pthread_start + 224
    frame #18: 0x00007fff20610443 libsystem_pthread.dylib`thread_start + 15

(lldb) ip2md 0x000000011ae7fece
MethodDesc:   000000011b72e3d0
Method Name:          System.Threading.Thread.StartCallback()
Class:                000000011b730998
MethodTable:          000000011b72f0f0
mdToken:              000000000600272E
Module:               000000011abf4000
IsJitted:             yes
Current CodeAddr:     000000011ae7fe70
Version History:
  ILCodeVersion:      0000000000000000
  ReJIT ID:           0
  IL Addr:            000000011b140f7c
     CodeAddr:           000000011ae7fe70  (ReadyToRun)
     NativeCodeVersion:  0000000000000000
Source file:  /_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105

(lldb) clrstack
OS Thread Id: 0x9 (10)
Unable to walk the managed stack. The current thread is likely not a
managed thread. You can run clrthreads to get a list of managed threads in
the process
Failed to start stack walk: 80070057

Configuration

kdubau commented 3 years ago

Interestingly, if I run the whole program with LLDB, the clrstack command works as expected - which leads me to suspect it's something with the file format createdump is generating?

kylewhite@kywhi-mbp ~/Workspace/macson/src/Tests/testdump (kywhi/dev/net6) $ lldb /Users/kylewhite/.dotnet/tools/testdump exception -- --thread background
Added Microsoft public symbol server
(lldb) target create "/Users/kylewhite/.dotnet/tools/testdump"
Current executable set to '/Users/kylewhite/.dotnet/tools/testdump' (x86_64).
(lldb) settings set -- target.run-args  "exception" "--thread" "background"

(lldb) run
Process 37035 launched: '/Users/kylewhite/.dotnet/tools/testdump' (x86_64)
Process ID: 37035
About to throw CrashyException...
Crashy Background Thread
Unhandled exception. testdump.CrashyException: It's a crash!
   at testdump.CrashyClass.<>c.<Exception>b__1_0() in /Users/kylewhite/Workspace/macson/src/Tests/testdump/CrashyClass.cs:line 19
   at testdump.CrashyClass.<>c__DisplayClass13_0.<RunActionInThread>b__0() in /Users/kylewhite/Workspace/macson/src/Tests/testdump/CrashyClass.cs:line 97
   at System.Threading.Thread.StartCallback() in System.Private.CoreLib.dll:token 0x600272e+0xe
Gathering state for process 37035
Writing minidump with heap to file /Users/kylewhite/dumps/coredump.37035
Written 853672120 bytes (208416 pages) to core file
Dump successfully written
Process 37035 stopped
* thread #10, stop reason = signal SIGABRT
    frame #0: 0x00007fff205e592e libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff205e592e <+10>: jae    0x7fff205e5938            ; <+20>
    0x7fff205e5930 <+12>: movq   %rax, %rdi
    0x7fff205e5933 <+15>: jmp    0x7fff205dfad9            ; cerror_nocancel
    0x7fff205e5938 <+20>: retq
Target 0: (testdump) stopped.

(lldb) thread list
Process 37035 stopped
  thread #1: tid = 0xa09dfc, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10, queue = 'com.apple.main-thread'
  thread #2: tid = 0xa09e14, 0x00007fff205df2ba libsystem_kernel.dylib`mach_msg_trap + 10
  thread #3: tid = 0xa09e15, 0x00007fff205e59ca libsystem_kernel.dylib`poll + 10
  thread #4: tid = 0xa09e16, 0x00007fff205e59ca libsystem_kernel.dylib`poll + 10
  thread #5: tid = 0xa09e17, 0x00007fff205dfb52 libsystem_kernel.dylib`__open + 10
  thread #6: tid = 0xa09e18, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #7: tid = 0xa09e19, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #8: tid = 0xa09e1b, 0x00007fff205e1cde libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #9: tid = 0xa09e1d, 0x00007fff205dfcce libsystem_kernel.dylib`read + 10
* thread #10: tid = 0xa09e1f, 0x00007fff205e592e libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGABRT

(lldb) clrstack
OS Thread Id: 0xa09e1f (10)
        Child SP               IP Call Site
000070000B59AAF0 00007fff205e592e [HelperMethodFrame: 000070000b59aaf0]
000070000B59AC70 000000012155204D testdump.CrashyClass+<>c.<Exception>b__1_0() [/Users/kylewhite/Workspace/macson/src/Tests/testdump/CrashyClass.cs @ 19]
000070000B59ACA0 0000000121551FA4 testdump.CrashyClass+<>c__DisplayClass13_0.<RunActionInThread>b__0() [/Users/kylewhite/Workspace/macson/src/Tests/testdump/CrashyClass.cs @ 97]
000070000B59ACE0 00000001206BFECE System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
000070000B59AE98 000000010232c3c9 [DebuggerU2MCatchHandlerFrame: 000070000b59ae98]
mikem8361 commented 3 years ago

Looks like you need a newer SOS than the released/published version:

dotnet tool uninstall -g dotnet-sos
dotnet tool install -g --version 5.0.0-preview.21366.1 --add-source https://dnceng.pkgs.visualstudio.com/public/_packaging/dotnet-tools/nuget/v3/index.json dotnet-sos
dotnet-sos install
kdubau commented 3 years ago

Yep, that did the trick - thanks @mikem8361!