dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

Missing FailFast message in Mono ".crash" file #58049

Open danmoseley opened 3 years ago

danmoseley commented 3 years ago

net6.0-MacCatalyst-Release-x64-Mono_Release-OSX.1015.Amd64.Open System.Net.Requests.Tests work item

Crashed Thread:        21  .NET ThreadPool Worker

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called
...
Thread 21 Crashed:: .NET ThreadPool Worker
0   libsystem_kernel.dylib          0x00007fff6944e33a __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fff6950ae60 pthread_kill + 430
2   libsystem_c.dylib               0x00007fff693d58a2 __abort + 139
3   libsystem_c.dylib               0x00007fff693d5817 abort + 135
4   net.dot.System.Net.Requests.Tests   0x000000010c2c942f ves_icall_System_Environment_FailFast + 127
5   net.dot.System.Net.Requests.Tests   0x000000010c2cc2e6 ves_icall_System_Environment_FailFast_raw + 54
6   net.dot.System.Net.Requests.Tests   0x000000010c501a80 do_icall + 304
7   net.dot.System.Net.Requests.Tests   0x000000010c5009fd do_icall_wrapper + 253
8   net.dot.System.Net.Requests.Tests   0x000000010c4f2b9a interp_exec_method + 2970
9   net.dot.System.Net.Requests.Tests   0x000000010c4f0eaf interp_runtime_invoke + 239
10  net.dot.System.Net.Requests.Tests   0x000000010c3e82eb mono_jit_runtime_invoke + 1211
11  net.dot.System.Net.Requests.Tests   0x000000010c30534f mono_runtime_try_invoke + 143
12  net.dot.System.Net.Requests.Tests   0x000000010c303ece mono_runtime_class_init_full + 878
13  net.dot.System.Net.Requests.Tests   0x000000010c5232b1 mono_interp_transform_method + 273
14  net.dot.System.Net.Requests.Tests   0x000000010c501904 do_transform_method + 100
15  net.dot.System.Net.Requests.Tests   0x000000010c4f3311 interp_exec_method + 4881
16  net.dot.System.Net.Requests.Tests   0x000000010c4f0eaf interp_runtime_invoke + 239
17  net.dot.System.Net.Requests.Tests   0x000000010c3e82eb mono_jit_runtime_invoke + 1211
18  net.dot.System.Net.Requests.Tests   0x000000010c303988 mono_runtime_invoke_checked + 136
19  net.dot.System.Net.Requests.Tests   0x000000010c31cc35 start_wrapper_internal + 613
20  net.dot.System.Net.Requests.Tests   0x000000010c31c9ae start_wrapper + 62
21  libsystem_pthread.dylib         0x00007fff6950b109 _pthread_start + 148
22  libsystem_pthread.dylib         0x00007fff69506b8b thread_start + 15
...
Thread 21 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x000070000f718000  rcx: 0x000070000f715488  rdx: 0x0000000000000000
  rdi: 0x0000000000011803  rsi: 0x0000000000000006  rbp: 0x000070000f7154b0  rsp: 0x000070000f715488
   r8: 0x0000000000000000   r9: 0x0000000000989680  r10: 0x000070000f718000  r11: 0x0000000000000246
  r12: 0x0000000000011803  r13: 0x00007fde21857d90  r14: 0x0000000000000006  r15: 0x0000000000000016
  rip: 0x00007fff6944e33a  rfl: 0x0000000000000246  cr2: 0x000000010d067000

Probably non actionable as is, but what does the managed code look like here? did someone queue FailFast onto the threadpool?

More importantly, can Mono folks detect abort() and include the raw string message in the ".crash" file? that would help a lot.

https://dev.azure.com/dnceng/public/_build/results?buildId=1313534&view=ms.vss-test-web.build-test-results-tab&runId=38657398&paneView=debug

@lambdageek ?

ghost commented 3 years ago

Tagging subscribers to this area: See info in area-owners.md if you want to be subscribed.

Issue Details
net6.0-MacCatalyst-Release-x64-Mono_Release-OSX.1015.Amd64.Open System.Net.Requests.Tests work item ``` Crashed Thread: 21 .NET ThreadPool Worker Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Application Specific Information: abort() called ... Thread 21 Crashed:: .NET ThreadPool Worker 0 libsystem_kernel.dylib 0x00007fff6944e33a __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff6950ae60 pthread_kill + 430 2 libsystem_c.dylib 0x00007fff693d58a2 __abort + 139 3 libsystem_c.dylib 0x00007fff693d5817 abort + 135 4 net.dot.System.Net.Requests.Tests 0x000000010c2c942f ves_icall_System_Environment_FailFast + 127 5 net.dot.System.Net.Requests.Tests 0x000000010c2cc2e6 ves_icall_System_Environment_FailFast_raw + 54 6 net.dot.System.Net.Requests.Tests 0x000000010c501a80 do_icall + 304 7 net.dot.System.Net.Requests.Tests 0x000000010c5009fd do_icall_wrapper + 253 8 net.dot.System.Net.Requests.Tests 0x000000010c4f2b9a interp_exec_method + 2970 9 net.dot.System.Net.Requests.Tests 0x000000010c4f0eaf interp_runtime_invoke + 239 10 net.dot.System.Net.Requests.Tests 0x000000010c3e82eb mono_jit_runtime_invoke + 1211 11 net.dot.System.Net.Requests.Tests 0x000000010c30534f mono_runtime_try_invoke + 143 12 net.dot.System.Net.Requests.Tests 0x000000010c303ece mono_runtime_class_init_full + 878 13 net.dot.System.Net.Requests.Tests 0x000000010c5232b1 mono_interp_transform_method + 273 14 net.dot.System.Net.Requests.Tests 0x000000010c501904 do_transform_method + 100 15 net.dot.System.Net.Requests.Tests 0x000000010c4f3311 interp_exec_method + 4881 16 net.dot.System.Net.Requests.Tests 0x000000010c4f0eaf interp_runtime_invoke + 239 17 net.dot.System.Net.Requests.Tests 0x000000010c3e82eb mono_jit_runtime_invoke + 1211 18 net.dot.System.Net.Requests.Tests 0x000000010c303988 mono_runtime_invoke_checked + 136 19 net.dot.System.Net.Requests.Tests 0x000000010c31cc35 start_wrapper_internal + 613 20 net.dot.System.Net.Requests.Tests 0x000000010c31c9ae start_wrapper + 62 21 libsystem_pthread.dylib 0x00007fff6950b109 _pthread_start + 148 22 libsystem_pthread.dylib 0x00007fff69506b8b thread_start + 15 ``` Probably non actionable as is, but what does the managed code look like here? did someone queue FailFast onto the threadpool? More importantly, can Mono folks detect abort() and include the raw string message in the ".crash" file? that would help a lot. https://dev.azure.com/dnceng/public/_build/results?buildId=1313534&view=ms.vss-test-web.build-test-results-tab&runId=38657398&paneView=debug @lambdageek ?
Author: danmoseley
Assignees: -
Labels: `area-VM-meta-mono`, `os-maccatalyst`
Milestone: -
danmoseley commented 3 years ago

cc @dotnet/ncl but there's really no way to know if it's networking code.

I see Environment.FailFast("Exception thrown from SocketAsyncEngine event loop: " + e.ToString(), e); but that's on a thread, not threadpool.

danmoseley commented 3 years ago

also unclear, perhaps @mdh1418 can help, why the dump wasn't uploaded here. it seems to find dumps (a lot actually)

https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-57992-merge-e24460956efe4cbea0/System.Net.Requests.Tests/dd92353d-5132-40be-b686-34b7d07a7bf0.log?sv=2019-07-07&se=2021-09-13T04%3A56%3A32Z&sr=c&sp=rl&sig=H0fylWeOrT8%2ByS8MBxIWAZ02VDCUdIGN3biT9OXfuUg%3D

steveisok commented 3 years ago

@danmoseley I don't think this has anything to do w/ managed code. System.Net.Requests.Tests is part of a handful of suites that are crashing for us on CI only. We're still trying to figure out why.

danmoseley commented 3 years ago

@steveisok could you clarify what you mean - do you mean, there was indeed FailFast from managed, but a subsequent unrelated issue in the native side prevented the nice message from showing up?

Or that the FailFast shown in the managed frame is not real? I assume the former as there is a real FailFast in that method IIRC.

steveisok commented 3 years ago

Yeah, I suspect it's probably an unrelated issue on the native side.

lambdageek commented 3 years ago

Hey folks, regardless of where the issue comes from, I think the question is: When there is a FailFast, can we include the managed message in the output that is emitted along with the stack traces etc.

The complication is that the output is not, I think, controlled by the runtime - it's macOS that is printing out all this stuff:

Crashed Thread:        21  .NET ThreadPool Worker

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
...

I don't know of a way to modify the information that is printed out. Does CoreCLR do something? Maybe there is some syscall we need to use?

As far as the managed FailFast message we can certainly capture it - we used to do it for Mono MERP. I just don't know if we can display it inline with the OS crash output.

danmoseley commented 3 years ago

@janvorli maybe you know?

janvorli commented 3 years ago

CoreCLR doesn't do anything to get a message into the system crashlog. We just print it to stderr. I haven't tried to look at the possibility of adding own message into the crashlog, but it would be really nice to have that. I'll try to check if there is any way to do so.

danmoseley commented 2 years ago

@tommcdon this is the case I was mentioning, where Mono writes out a crash file with basic diagnostics that does not need any tooling to look at. I believe it's aggregating info from macOS in this case.

I wonder whether there is value in CoreCLR doing something like this. (I'll suggest it in your OneNote)

@janvorli did you get a chance to look into whether we can get the failfast string into it? But, as @lambdageek mentions, maybe that doesn't matter, if Mono puts it in the crash file by some other means. (I do'nt know where else macOS puts these -- in the system log?)

tommcdon commented 2 years ago

@tommcdon this is the case I was mentioning, where Mono writes out a crash file with basic diagnostics that does not need any tooling to look at. I believe it's aggregating info from macOS in this case.

@mikem8361 recently added support for JSON file output on Mac. I believe it is enabled with the COMPlus_EnableCrashReport env variable. It seems like we should enable this in CI if not already doing so. cc @hoyosjs

mikem8361 commented 2 years ago

If you have any questions about the crash report json support in createdump, let me know. Like Tom said it is enabled with the above env var and writes the json file with the coredump name + ".crashreport.json". It does contain the signal name for the faulting thread and the managed exception type name for any unhandled managed exception.

janvorli commented 1 year ago

@lambdageek, @danmoseley I have believed there is no way to include a custom message in the crash log, but today I looked again and found this on stackoverflow (https://stackoverflow.com/questions/36827239/xcode-exception-reason-of-crash-report-from-apples-crash-report-service)

On OSX developers can provide an annotation before an app (possibly) crashes using the global __crashreporter_info__ declared like this to be accessible.

const char *__crashreporter_info__ = NULL;
asm(".desc ___crashreporter_info__, 0x10");
janvorli commented 1 year ago

It should appear in the Application Specific Information: section of the log.

janvorli commented 1 year ago

This look actually quite ancient and it doesn't seem to work. There is a more modern version that can log stack trace separately: https://alastairs-place.net/blog/2013/01/10/interesting-os-x-crash-report-tidbits/ There is an example at the end that worked.

janvorli commented 1 year ago

The downside though is that this is an undocumented feature.

lambdageek commented 1 year ago

This is excellent detective work @janvorli, it's great to see that this works on ios, too.

@steveisok I think this will help with CI, but also possibly it will be useful for customers

ghost commented 1 year ago

Tagging subscribers to 'os-ios': @steveisok, @akoeplinger, @kotlarmilos See info in area-owners.md if you want to be subscribed.

Issue Details
net6.0-MacCatalyst-Release-x64-Mono_Release-OSX.1015.Amd64.Open System.Net.Requests.Tests work item ``` Crashed Thread: 21 .NET ThreadPool Worker Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Application Specific Information: abort() called ... Thread 21 Crashed:: .NET ThreadPool Worker 0 libsystem_kernel.dylib 0x00007fff6944e33a __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff6950ae60 pthread_kill + 430 2 libsystem_c.dylib 0x00007fff693d58a2 __abort + 139 3 libsystem_c.dylib 0x00007fff693d5817 abort + 135 4 net.dot.System.Net.Requests.Tests 0x000000010c2c942f ves_icall_System_Environment_FailFast + 127 5 net.dot.System.Net.Requests.Tests 0x000000010c2cc2e6 ves_icall_System_Environment_FailFast_raw + 54 6 net.dot.System.Net.Requests.Tests 0x000000010c501a80 do_icall + 304 7 net.dot.System.Net.Requests.Tests 0x000000010c5009fd do_icall_wrapper + 253 8 net.dot.System.Net.Requests.Tests 0x000000010c4f2b9a interp_exec_method + 2970 9 net.dot.System.Net.Requests.Tests 0x000000010c4f0eaf interp_runtime_invoke + 239 10 net.dot.System.Net.Requests.Tests 0x000000010c3e82eb mono_jit_runtime_invoke + 1211 11 net.dot.System.Net.Requests.Tests 0x000000010c30534f mono_runtime_try_invoke + 143 12 net.dot.System.Net.Requests.Tests 0x000000010c303ece mono_runtime_class_init_full + 878 13 net.dot.System.Net.Requests.Tests 0x000000010c5232b1 mono_interp_transform_method + 273 14 net.dot.System.Net.Requests.Tests 0x000000010c501904 do_transform_method + 100 15 net.dot.System.Net.Requests.Tests 0x000000010c4f3311 interp_exec_method + 4881 16 net.dot.System.Net.Requests.Tests 0x000000010c4f0eaf interp_runtime_invoke + 239 17 net.dot.System.Net.Requests.Tests 0x000000010c3e82eb mono_jit_runtime_invoke + 1211 18 net.dot.System.Net.Requests.Tests 0x000000010c303988 mono_runtime_invoke_checked + 136 19 net.dot.System.Net.Requests.Tests 0x000000010c31cc35 start_wrapper_internal + 613 20 net.dot.System.Net.Requests.Tests 0x000000010c31c9ae start_wrapper + 62 21 libsystem_pthread.dylib 0x00007fff6950b109 _pthread_start + 148 22 libsystem_pthread.dylib 0x00007fff69506b8b thread_start + 15 ... Thread 21 crashed with X86 Thread State (64-bit): rax: 0x0000000000000000 rbx: 0x000070000f718000 rcx: 0x000070000f715488 rdx: 0x0000000000000000 rdi: 0x0000000000011803 rsi: 0x0000000000000006 rbp: 0x000070000f7154b0 rsp: 0x000070000f715488 r8: 0x0000000000000000 r9: 0x0000000000989680 r10: 0x000070000f718000 r11: 0x0000000000000246 r12: 0x0000000000011803 r13: 0x00007fde21857d90 r14: 0x0000000000000006 r15: 0x0000000000000016 rip: 0x00007fff6944e33a rfl: 0x0000000000000246 cr2: 0x000000010d067000 ``` Probably non actionable as is, but what does the managed code look like here? did someone queue FailFast onto the threadpool? More importantly, can Mono folks detect abort() and include the raw string message in the ".crash" file? that would help a lot. https://dev.azure.com/dnceng/public/_build/results?buildId=1313534&view=ms.vss-test-web.build-test-results-tab&runId=38657398&paneView=debug @lambdageek ?
Author: danmoseley
Assignees: janvorli
Labels: `area-VM-meta-mono`, `os-ios`, `os-maccatalyst`
Milestone: 9.0.0
akoeplinger commented 10 months ago

The downside though is that this is an undocumented feature.

Swift also defines this struct so it should be quite safe to use:

https://github.com/apple/swift/blob/98e65d015979c7b5a58a6ecf2d8598a6f7c85794/include/swift/Runtime/Debug.h#L27C19-L56