dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.44k stars 4.76k forks source link

[ARM7/Linux] nullref leads to ABRT / SEGV exit code #12871

Closed gpomykala closed 4 years ago

gpomykala commented 5 years ago

Environment: Host (useful for support): Version: 2.2.1 Commit: 878dd11e62

.NET Core SDKs installed: No SDKs were found.

.NET Core runtimes installed: Microsoft.AspNetCore.All 2.2.1 [/opt/dotnet/shared/Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.2.1 [/opt/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 2.2.1 [/opt/dotnet/shared/Microsoft.NETCore.App]

root@gpomykala:~ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 10 (v7l) BogoMIPS : 3.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10

processor : 1 model name : ARMv7 Processor rev 10 (v7l) BogoMIPS : 3.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10

Hardware : Freescale i.MX6 Quad/DualLite (Device Tree) Revision : 0000 Serial : 0000000000000000

We have an ASP.NET core app which deserializes some JSON data using custom JsonConverter. It may happen that converter throws a NullRef which is handled in the app.

Unfortunately on linux-arm we never reach the error handling code since the process exits either with SIGABRT. I believe it is similar to ##16462.

Logs: Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object. at MyCompany.Persistency.Converters.MyConverter1.DoReadJson(JToken jsonObject, T resultValue, JsonSerializer serializer) at MyCompany.Persistency.Converters.BaseConverter1.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(JsonConverter converter, JsonReader reader, Type objectType, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent) ) = ? +++ killed by SIGABRT (core dumped) +++ Aborted (core dumped)

I am trying to make a reproduction with a sample app, please let me know whether i could provide more details.

janvorli commented 5 years ago

@gpomykala is it possible that there are native code frames on stack between the frame where the null reference exception occurs and the frame where the handler is located? Such a case would be when managed code would call a native code and that in turn would call a managed callback.

gpomykala commented 5 years ago

That may be possible since there is aspnet core middleware and JSON.net library code between the exception source and and the error handling block. I need to enable core dump storage on that host. I'll be back ...

śr., 12.06.2019, 16:31 użytkownik Jan Vorlicek notifications@github.com napisał:

@gpomykala https://github.com/gpomykala is it possible that there are native code frames on stack between the frame where the null reference exception occurs and the frame where the handler is located? Such a case would be when managed code would call a native code and that in turn would call a managed callback.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/25123?email_source=notifications&email_token=ABL7QACWUJ4CEKJIC4NTBRTP2ECEFA5CNFSM4HXJFT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQTXMA#issuecomment-501300144, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL7QAA4WD7RMSWY76JU453P2ECEFANCNFSM4HXJFT2A .

gpomykala commented 5 years ago

I have the coredump file, can i submit it to dev team in any other way than sharing the link in public?

gpomykala commented 5 years ago

I followed the debugging instruction at https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md but the lddb itself crashes with SEGFAULT: osboxes@osboxes:~$ lldb-3.9 -O "settings set target.exec-search-paths ~/dotnet/" -o "plugin load /home/osboxes/dotnet/shared/Microsoft.NETCore.App/2.2.1/libsosplugin.so" --core ~/dotnet/dump ~/dotnet/dotnet (lldb) settings set target.exec-search-paths ~/dotnet/ (lldb) target create "/home/osboxes/dotnet/dotnet" --core "/home/osboxes/dotnet/dump" Segmentation fault (core dumped)

gpomykala commented 5 years ago

https://github.com/dotnet/diagnostics/issues/58 suggests that lldb does not work with ARM coredumps, is that still the case?

janvorli commented 5 years ago

Yes, it is. Newer LLDB versions actually don't crash, but all threads have wrong state of processor registers, so it cannot show you stack or anything useful. The bug causing that was fixed in lldb master branch (so even the version 8 still has that bug). I have built lldb 8 from the sources myself (cross building on x64 linux box as build on arm32 device takes ages), applying the fix from master on that and then I was able to open dumps. I can share that build with you if you want, but it would work for you only if you have the same or newer version of glibc as Ubuntu 16.04 which was the target OS of my cross-build. I think the glibc version was 2.23.

gpomykala commented 5 years ago

Hi, thanks for your reply. That lldb build may work since I use Ubuntu 18.04. thanks in advance

pt., 14.06.2019, 15:18 użytkownik Jan Vorlicek notifications@github.com napisał:

Yes, it is. Newer LLDB versions actually don't crash, but all threads have wrong state of processor registers, so it cannot show you stack or anything useful. The bug causing that was fixed in lldb master branch (so even the version 8 still has that bug). I have built lldb 8 from the sources myself (cross building on x64 linux box as build on arm32 device takes ages), applying the fix from master on that and then I was able to open dumps. I can share that build with you if you want, but it would work for you only if you have the same or newer version of glibc as Ubuntu 16.04 which was the target OS of my cross-build. I think the glibc version was 2.23.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/25123?email_source=notifications&email_token=ABL7QACOET4JX43ISJ6BL2LP2OLA7A5CNFSM4HXJFT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWYLQY#issuecomment-502105539, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL7QAAU3THK73L5KMCZGPLP2OLA7ANCNFSM4HXJFT2A .

janvorli commented 5 years ago

@gpomykala here is a link. It will be valid till tomorrow: https://1drv.ms/u/s!AkLV4wRkyHYhxWfasyamnzHlW5kI?e=EQ9IW4

This tarball contains the whole LLVM (lldb, clang, etc). The lldb is at LLVM-8.0.0-Linux\bin\lldb. You just run this lldb using its full path. No PATH variable modification is needed.

gpomykala commented 5 years ago

Thanks for the link, i have no luck with the build though: ./lldb: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory Since it is an ARM32 exec with dynamically linked dependencies it would need the toolchain we use. i may try to build it on my own from sources - would you share the # of the fix applied from master?

janvorli commented 5 years ago

I think you just need to install that library. The package name is libtinfo5.

Here is the LLDB commit that fixed the issue if you still want to build it yourself: https://github.com/llvm-mirror/lldb/commit/2c4669267bfaf9277f9df44291cac090ae23f840

gpomykala commented 5 years ago

I managed to view the native stack using ptxdist + gdb: (gdb) bt

0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47

dotnet/coreclr#1 0x76c74a70 in libc_signal_restore_set (set=0x63f1be44) at ../sysdeps/unix/sysv/linux/nptl-signals.h:80 dotnet/coreclr#2 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48 dotnet/coreclr#3 0x76c7577e in GI_abort () at abort.c:79 dotnet/coreclr#4 0x76909f2a in PROCAbort () from /home/osboxes/dotnet/libcoreclr.so dotnet/coreclr#5 0x76909148 in PROCEndProcess(void, unsigned int, int) () from /home/osboxes/dotnet/libcoreclr.so dotnet/coreclr#6 0x767190bc in UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT) () from /home/osboxes/dotnet/libcoreclr.so dotnet/coreclr#7 0x767191ac in DispatchManagedException(PAL_SEHException&, bool) () from /home/osboxes/dotnet/libcoreclr.so dotnet/coreclr#8 0x766c6522 in IL_Rethrow() () from /home/osboxes/dotnet/libcoreclr.so dotnet/coreclr#9 0x5e3c4712 in ?? ()

is that sufficient to investigate the root cause in coreclr?

janvorli commented 5 years ago

No, it is not. GDB stops unwinding stack at the first managed frame, so we don't know if that managed code was called from a native one. I still guess it is the case, since the abort happened in the first pass of EH where we are looking for the exception handler, but LLDB would let us see all the frames on the stack. Have you tried to install the libtinfo5 package and see if you can run LLDB after that?

gpomykala commented 5 years ago

Not yet, there are more missing dependencies which are not available on that system: libncurses.so.5 => not found libtinfo.so.5 => not found libform.so.5 => not found libpanel.so.5 => not found libxml2.so.2 => not found libedit.so.2 => not found libtinfo.so.5 => not found

janvorli commented 5 years ago

You need to install all of the dependencies (you would need to install them even if you built LLDB yourself). That means: libncurses5 libtinfo5 libxml2 libedit2

So you'd run: sudo apt-get install libncurses5 libtinfo5 libxml2 libedit2

gpomykala commented 5 years ago

Unfortunately we do not have a luxury of package manager on that distro (it's an embedded system) but I'll see how can we bring those libs

pon., 17.06.2019, 16:31 użytkownik Jan Vorlicek notifications@github.com napisał:

You need to install all of the dependencies (you would need to install them even if you built LLDB yourself). That means: libncurses5 libtinfo5 libxml2 libedit2

So you'd run: sudo apt-get install libncurses5 libtinfo5 libxml2 libedit2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/25123?email_source=notifications&email_token=ABL7QAC6GJF6WSDHDECFUNTP26N27A5CNFSM4HXJFT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX3LDOY#issuecomment-502706619, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL7QAGJKTAGGDZR5KTV7CDP26N27ANCNFSM4HXJFT2A .

janvorli commented 5 years ago

I have thought you've said you are trying to run lldb on Ubuntu 18.04. The libraries need to be located where the lldb runs - they are dependencies of lldb.

gpomykala commented 5 years ago

Well, lldb you shared on OneDrive is targeted for arm, it won't run on Ubuntu

pon., 17.06.2019, 17:54 użytkownik Jan Vorlicek notifications@github.com napisał:

I have thought you've said you are trying to run lldb on Ubuntu 18.04. The libraries need to be located where the lldb runs - they are dependencies of lldb.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/25123?email_source=notifications&email_token=ABL7QACEHR5656BVH4TDW73P26XU3A5CNFSM4HXJFT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX3TZRQ#issuecomment-502742214, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL7QACXL4NMZK4PLFCBDFTP26XU3ANCNFSM4HXJFT2A .

janvorli commented 5 years ago

Oh, I have thought you were trying to run it on Ubuntu 18.04 on an ARM device.

gpomykala commented 5 years ago

Well, my apologies then. Target is custom distro built with ptxdist and Ubuntu 18.04 is our Dev environment for that platform.

pon., 17.06.2019, 18:30 użytkownik Jan Vorlicek notifications@github.com napisał:

Oh, I have thought you were trying to run it on Ubuntu 18.04 on an ARM device.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/25123?email_source=notifications&email_token=ABL7QABHLPCVUEIUDPUQVRLP263ZPA5CNFSM4HXJFT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX3XGWI#issuecomment-502756185, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL7QAC6HZ4FKWGW2IMU6ODP263ZPANCNFSM4HXJFT2A .

janvorli commented 5 years ago

Ok, thanks for the explanation. In such case, it seems that the easiest way would be to use our tool from the dotnet/diagnostics repo. That tools allows you to load a dump and then run SOS commands to investigate managed code stack, objects etc. You can install the dotnet-dump as described in https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md. You'd then use the dotnet-dump analyze subcommand to open the core dump you have. Btw, you'll need to run echo 0x3f > /proc/self/coredump_filter in the same shell where you run your application so that the dump contains all the necessary details for the dotnet-dump analyze to work. If the dotnet-dump analyze loads the dump successfully, you can start running SOS commands. clrstack will dump all the managed frames of a thread.

gpomykala commented 5 years ago

Hi, I reproduced the problem on raspbian - it was much easier to install all required lldb dependencies.

I haven't had much luck with lldb still - none of the SOS commands work with libsosplugin.so delivered with netcore 2.2 sdk:

pi@raspberrypi:~/LLVM-8.0.0-Linux/bin $ ./lldb -O "settings set target.exec-search-paths /home/pi/dmp" -o "plugin load libsosplugin.so" --core /home/pi/dmp/core-dotnet-sig6-user0-group0-pid531-time1560864002 /usr/share/dotnet/dotnet
(lldb) settings set target.exec-search-paths /home/pi/dmp
(lldb) target create "/usr/share/dotnet/dotnet" --core "/home/pi/dmp/core-dotnet-sig6-user0-group0-pid531-time1560864002"
Core file '/home/pi/dmp/core-dotnet-sig6-user0-group0-pid531-time1560864002' (arm) was loaded.
(lldb) plugin load libsosplugin.so
(lldb) sos ClrStack
(lldb) bt
* thread dotnet/coreclr#1, name = 'dotnet', stop reason = signal SIGABRT
  * frame #0: 0x76b8f45c libc.so.6`__GI_raise(sig=0) at raise.c:51
    frame dotnet/coreclr#1: 0x76b90824 libc.so.6`__GI_abort at abort.c:89
    frame dotnet/coreclr#2: 0x7681df2a libcoreclr.so`PROCAbort + 46
    frame dotnet/coreclr#3: 0x7681d148 libcoreclr.so`PROCEndProcess(void*, unsigned int, int) + 248
    frame dotnet/coreclr#4: 0x7662d0bc libcoreclr.so`UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 472
    frame dotnet/coreclr#5: 0x7662d1ac libcoreclr.so`DispatchManagedException(PAL_SEHException&, bool) + 156
    frame dotnet/coreclr#6: 0x765da372 libcoreclr.so`IL_Throw(Object*) + 506
    frame dotnet/coreclr#7: 0x5aeaf38c
    frame dotnet/coreclr#8: 0x5ad9784c
(lldb) clrstack
(lldb) dumpstack
(lldb) eestack
(lldb) soshelp
(lldb) quit
./lldb[0x3a818]
./lldb[0x38070]
./lldb[0x3af40]
/lib/arm-linux-gnueabihf/libc.so.6(__default_sa_restorer+0x0)[0x71d6a6b0]
Stack dump:
0.      Program arguments: ./lldb -O settings set target.exec-search-paths /home/pi/dmp -o plugin load libsosplugin.so --core /home/pi/dmp/core-dotnet-sig6-user0-group0-pid531-time1560864002 /usr/share/dotnet/dotnet
^C^Z
^C^Z^Z^Z
^C^Z^Z^Z
^Z^Z^ZSegmentation fault (core dumped)

dotnet-dump does not even get that far - it claims that dotnet process is not running compatible .NET Core runtime.

root@raspberrypi:~# dotnet-dump collect --process-id 1035
Writing minidump with heap to /root/core_20190618_154449
Process 1035 not running compatible .NET Core runtime

NET Core details:

root@raspberrypi:~# /usr/share/dotnet/dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.2.300
 Commit:    73efd5bd87

Runtime Environment:
 OS Name:     raspbian
 OS Version:  9
 OS Platform: Linux
 RID:         linux-arm
 Base Path:   /usr/share/dotnet/sdk/2.2.300/

Host (useful for support):
  Version: 2.2.5
  Commit:  0a3c9209c0

.NET Core SDKs installed:
  2.2.300 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.5 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.5 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.5 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

BTW the managed stack is dumped to the console if that is what you are looking for. And the issue manifests for other exceptions as well, not only for NullReferenceException - last time it crashed at SocketException:

Unhandled Exception: Vibrant.InfluxDB.Client.InfluxException: An unknown error occurred. Please inspect the inner exception. ---> System.Net.Http.HttpRequestException: Connection refused ---> System.Net.Sockets.SocketException: Connection refused
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Vibrant.InfluxDB.Client.InfluxClient.ExecuteHttpAsync(HttpMethod method, String url, HttpContent content)
   --- End of inner exception stack trace ---
   at Vibrant.InfluxDB.Client.InfluxClient.ExecuteHttpAsync(HttpMethod method, String url, HttpContent content)
   at Vibrant.InfluxDB.Client.InfluxClient.PerformQueryInternal(String query, String db, Boolean forcePost, Boolean isTimeSeriesQuery, Boolean requireChunking, Object parameters, InfluxQueryOptions options)
   at Vibrant.InfluxDB.Client.InfluxClient.ExecuteQueryInternalAsync[TInfluxRow](String query, String db, Boolean isTimeSeriesQuery, Boolean forcePost, Object parameters, InfluxQueryOptions options)
   at Vibrant.InfluxDB.Client.InfluxClientExtensions.ShowDatabasesAsync(IInfluxClient client)
<appcode>
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<>c.<ThrowAsync>b__7_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.<>c.<.cctor>b__5_0(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Aborted (core dumped)
janvorli commented 5 years ago

The libsosplugin from the SDK doesn't work with the lldb I've shared with you. You need a libsosplugin.so built from the dotnet/diagnostics repo. I can share that one with you. As for the dotnet-dump, it seems I was not clear on this. I was referring to using dotnet-dump analyze command on core dump that you get from the OS.

As for the stack trace you get in the console at the crash time, we could actually try to look at the source code based on that and see where it crosses a managed to native boundary and back. The clrstack command would show special frames that are at the edge between unmanaged and managed code, so it would be easier to spot that.

janvorli commented 5 years ago

@gpomykala I hope you don't mind I've modified the markup in your previous comment so that the stack trace is more readable.

gpomykala commented 5 years ago

Hi, i don't mind formatting the stacktrace, thanks.

So i tried the following:

root@raspberrypi:~# echo 0x3f > /proc/self/coredump_filter
root@raspberrypi:~# /usr/share/dotnet/dotnet /home/pi/publish/LocalRestApi.dll (core dumped)
root@raspberrypi:~# dotnet-dump analyze /var/lib/coredumps/core-dotnet-sig6-user0-group0-pid6158-time1560944964

No success still:

Unhandled exception: System.NotImplementedException: Support for 40 not yet implemented.
   at Microsoft.Diagnostics.Runtime.CoreDumpReader..ctor(String filename)
   at Microsoft.Diagnostic.Tools.Dump.Analyzer.Analyze(FileInfo dump_path, String[] command) in /_/src/Tools/dotnet-dump/Analyzer.cs:line 44
   at System.CommandLine.Invocation.CommandHandler.GetResultCodeAsync(Object value, InvocationContext context)
   at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass2_0.<<InvokeAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c.<<UseParseErrorReporting>b__16_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c__DisplayClass8_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c.<<UseSuggestDirective>b__7_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c.<<UseParseDirective>b__6_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c.<<UseHelp>b__14_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass3_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c.<<RegisterWithDotnetSuggest>b__17_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.CommandLine.Invocation.InvocationExtensions.<>c__DisplayClass5_0.<<UseExceptionHandler>b__0>d.MoveNext()

I am using dotnet-dump version mentioned in the documentation -> https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md

root@raspberrypi:~# dotnet-dump --version
1.0.4-preview6.19311.1+caee2e63c16b3ad467620d1d24578aa758e99473
janvorli commented 5 years ago

@hoyosjs any idea why the dotnet-dump analyze fails? Based on our offline chat, I was thinking it should work.

janvorli commented 5 years ago

@gpomykala here is a link to the libsosplugin and related files that should work with the lldb I've shared with you: https://1drv.ms/u/s!AkLV4wRkyHYhxWichoY_nVihLSvr?e=sYkxQw

hoyosjs commented 5 years ago

It looks like I was wrong. https://github.com/dotnet/diagnostics/issues/168#issuecomment-486875459 suggests that we are unable of understanding non-x64 contexts in the tool, and that CLRMD also has some shortcomings there.

gpomykala commented 5 years ago

Hello,

I have something meaningful - finally:

(lldb) dumpstack
OS Thread Id: 0x4df (1)
TEB information is not available so a stack size of 0xFFFF is assumed
Current frame: libc.so.6!__GI_raise + 0 [/build/glibc-Ps4RQ4/glibc-2.24/sysdeps/unix/sysv/linux/raise.c:51]
ChildFP  RetAddr  Caller, Callee
55097320 76602da9 libcoreclr.so!DefaultCatchHandler(_EXCEPTION_POINTERS*, Object**, int, int, int, int, int) + 0, calling libcoreclr.so!__cxa_finalize
55097520 768f8f2b libcoreclr.so!PROCAbort + 0, calling libcoreclr.so!pipe2 + 0
55097528 768f8149 libcoreclr.so!PROCEndProcess(void*, unsigned int, int) + 0, calling libcoreclr.so!PROCAbort
55097540 767080bd libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 0, calling libcoreclr.so!TerminateProcess
55097568 5ae973e3 (MethodDesc 589a5db4 + 0 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback())
55097584 5ae973e3 (MethodDesc 589a5db4 + 0 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback())
550975F8 5afaf38d (MethodDesc 5775f330 + 0 System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()), calling 5aca434a
550975FC 5afaf38d (MethodDesc 5775f330 + 0 System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()), calling 5aca434a
5509775C 5afaf38d (MethodDesc 5775f330 + 0 System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()), calling 5aca434a
55097780 767081ad libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool) + 0, calling libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*)
55097930 766b5373 libcoreclr.so!IL_Throw(Object*) + 0, calling libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool)
550979C0 5afaf38d (MethodDesc 5775f330 + 0 System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()), calling 5aca434a
550979D0 766b51a7 libcoreclr.so!IL_Throw(Object*) + 0, calling libcoreclr.so!LazyMachStateCaptureState
550979E8 5afaf38d (MethodDesc 5775f330 + 0 System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()), calling 5aca434a
550979F8 5aea83a7 (MethodDesc 589a0eb8 + 0 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object))
55097A28 5ae9784d (MethodDesc 51e2cd30 + 0 System.Threading.QueueUserWorkItemCallbackDefaultContext.ExecuteWorkItem())
55097A38 5ae95d11 (MethodDesc 589a4a54 + 0 System.Threading.ThreadPoolWorkQueue.Dispatch())
55097A58 768e3287 libcoreclr.so!SetLastError + 0, calling libcoreclr.so!__cxa_throw + 0
55097A98 5ae973e3 (MethodDesc 589a5db4 + 0 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback())
55097AA0 76711ad7 libcoreclr.so!CallDescrWorkerInternal + 0
55097AB0 7667f59d libcoreclr.so!MethodDescCallSite::CallTargetWorker(unsigned long long const*, unsigned long long*, int) + 0, calling libcoreclr.so!CallDescrWorkerInternal
55097AD8 768f3787 libcoreclr.so!CorUnix::CPalSynchronizationManager::GetSynchControllersForObjects(CorUnix::CPalThread*, CorUnix::IPalObject**, unsigned int, void**, CorUnix::CSynchControllerBase::ControllerType) + 0, calling libcoreclr.so!expf + 0
55097B18 7664acc9 libcoreclr.so!MetaSig::GetReturnTypeNormalized(TypeHandle*) const + 0, calling libcoreclr.so!SigPointer::PeekElemTypeClosed(Module*, SigTypeContext const*) const
55097B68 765f8ee3 libcoreclr.so!ArgIteratorTemplate<ArgIteratorBase>::GetNextOffset() + 0, calling libcoreclr.so!ArgIteratorTemplate<ArgIteratorBase>::ComputeReturnFlags()
55097BA8 76785d6b libcoreclr.so!QueueUserWorkItemManagedCallback(void*) + 0, calling libcoreclr.so!MethodDescCallSite::CallTargetWorker(unsigned long long const*, unsigned long long*, int)
55097C38 7665c537 libcoreclr.so!ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 0
55097CD0 7665cae5 libcoreclr.so!ManagedThreadBase::ThreadPool(ADID, void (*)(void*), void*) + 0, calling libcoreclr.so!ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)
55097CF8 7677232b libcoreclr.so!ManagedPerAppDomainTPCount::DispatchWorkItem(bool*, bool*) + 0, calling libcoreclr.so!ManagedThreadBase::ThreadPool(ADID, void (*)(void*), void*)
55097D08 768f5fb7 libcoreclr.so!PAL_WaitForSingleObjectPrioritized + 0, calling libcoreclr.so!CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)
55097D68 76673dcb libcoreclr.so!ThreadpoolMgr::WorkerThreadStart(void*) + 0
55097E18 000161e3 dotnet + 0xffffffff, calling dotnet!std::string::length() const + 0
55097E40 768fa5b7 libcoreclr.so!CorUnix::CPalThread::ThreadEntry(void*) + 0

I hope this is the stack you wanted to see

janvorli commented 5 years ago

@gpomykala can you please run clrstack -f instead?

gpomykala commented 5 years ago
(lldb) clrstack -f
OS Thread Id: 0x4df (1)
Child SP       IP Call Site
55097300 76C6A45C libc.so.6!__GI_raise + 160 at /build/glibc-Ps4RQ4/glibc-2.24/sysdeps/unix/sysv/linux/raise.c:51
55097410 76C6B824 libc.so.6!__GI_abort + 304 at /build/glibc-Ps4RQ4/glibc-2.24/stdlib/abort.c:91
55097528 768F8F2A libcoreclr.so!PROCAbort + 46
55097530 768F8148 libcoreclr.so!PROCEndProcess(void*, unsigned int, int) + 248
55097548 767080BC libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 472
55097788 767081AC libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool) + 156
55097938 766B5372 libcoreclr.so!IL_Throw(Object*) + 506
5509796C          [HelperMethodFrame: 5509796c]
550979F0 5AFAF38C System.Private.CoreLib.dll!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 40 [/root/coreclr/src/mscorlib/src/System/Runtime/ExceptionServices/ExceptionDispatchInfo.cs @ 131]
55097A00 5AEA83A6 System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) + 302 [/root/coreclr/src/mscorlib/shared/System/Threading/ExecutionContext.cs @ 202]
55097A30 5AE9784C System.Private.CoreLib.dll!System.Threading.QueueUserWorkItemCallbackDefaultContext.ExecuteWorkItem() + 60 [/root/coreclr/src/mscorlib/src/System/Threading/ThreadPool.cs @ 1037]
55097A40 5AE95D10 System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() + 564 [/root/coreclr/src/mscorlib/src/System/Threading/ThreadPool.cs @ 588]
55097AA0 5AE973E2 System.Private.CoreLib.dll!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() + 26 [/root/coreclr/src/mscorlib/src/System/Threading/ThreadPool.cs @ 879]
55097C68          [DebuggerU2MCatchHandlerFrame: 55097c68]
janvorli commented 5 years ago

Thank you for the stack trace. Combining this knowledge with the exception stack trace dumped by the application to the console that you've shared in one of your previous comments, it seems that the code path that awaits for the Vibrant.InfluxDB.Client.InfluxClientExtensions.ShowDatabasesAsync is missing a catch for the exception or we have a bug.

Can you please try to run the SOS dumpasync command? Based on the output I get, I'll need to run the command with additional parameters, so if you are running it in a live debugging session, please keep the lldb open. If you are debugging a dump, you can reopen it later just fine.

gpomykala commented 5 years ago

Thanks for your reply. We do have a catch clause for that exception and it works fine under windows. on linux-arm, however, we never reach the catch clause since the process aborts without reaching it. I found 2 cases that exhibit such behaviour. It is quite undeterministic - in one case (described in 1st post) it was enough to alter the code to throw ArgumentNullException instead of NullReferenceException to "fix" it for linux-arm. But then i found another case when it crashes upon Vibrant.InfluxDB.Client.InfluxException which means we cannot really rely on exception handling logic for linux-arm anymore.

Dumpasync failed for that core dump file :/

(lldb) dumpasync
Could not request method table data for object 5C900000 (MethodTable: 0138EF78).
DumpAsync  failed
janvorli commented 5 years ago

@gpomykala thank you for the details, I didn't know the same code works fine on Windows for you. Then it is really a bug on our side. Would you be able to test it in Linux x64 (not on Alpine) too? It would be great to know if it is Linux specific or arm32 specific. Debugging on x64 Linux is much easier than on ARM32. I assume your app is not open source so I cannot debug the issue locally, right?

gpomykala commented 5 years ago

@janvorli it crashes in the same way on linux64 as well. Out app is not open sourced however i will try to prepare a sample app which can be used to reproduce the problem

janvorli commented 5 years ago

@gpomykala thank you for verifying that it happens on Linux x64 too. As for a sample app to repro the issue, that would be great!

gpomykala commented 5 years ago

Hello,

The 2nd case i was chasing recently turned out to be regression in our code (unhandled exception in "async void" event handler) - entirely our fault and it was reproducible on all platforms (win64, linux64, arm).

The original case i raised this issue for still persists, BUT:

I tried to reproduce it in a separate sample app with no luck so far.

Here is the stack:

(lldb) clrstack -f
OS Thread Id: 0x1a3c (1)
Child SP       IP Call Site
58895AD8 76B7B45C libc.so.6!__GI_raise + 160 at /build/glibc-Ps4RQ4/glibc-2.24/sysdeps/unix/sysv/linux/raise.c:51
58895BE8 76B7C824 libc.so.6!__GI_abort + 304 at /build/glibc-Ps4RQ4/glibc-2.24/stdlib/abort.c:91
58895D00 76809F2A libcoreclr.so!PROCAbort + 46
58895D08 76809148 libcoreclr.so!PROCEndProcess(void*, unsigned int, int) + 248
58895D20 766190BC libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 472
58895F60 766191AC libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool) + 156
58896110 765C6522 libcoreclr.so!IL_Rethrow() + 326
58896134          [HelperMethodFrame: 58896134]
588961B8 525B1930 Newtonsoft.Json.dll!Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(Newtonsoft.Json.JsonReader, System.Type, Boolean) + 1052 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Serialization\JsonSerializerInternalReader.cs @ 137]
58896EF0          [FaultingExceptionFrame: 58896ef0]
588974B8 50D3A23E Persistency.dll!Persistency.Converters.SerializationConverter`1[[System.__Canon, System.Private.CoreLib]].DoReadJson(Newtonsoft.Json.Linq.JToken, System.__Canon, Newtonsoft.Json.JsonSerializer) + 106
588974E8 50D381E4 Persistency.dll!Persistency.Converters.BaseConverter`1[[System.__Canon, System.Private.CoreLib]].ReadJson(Newtonsoft.Json.JsonReader, System.Type, System.Object, Newtonsoft.Json.JsonSerializer) + 924
58897628 50D37C04 Newtonsoft.Json.dll!Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(Newtonsoft.Json.JsonConverter, Newtonsoft.Json.JsonReader, System.Type, System.Object) + 388 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Serialization\JsonSerializerInternalReader.cs @ 2159]
588976F0 525B173E Newtonsoft.Json.dll!Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(Newtonsoft.Json.JsonReader, System.Type, Boolean) + 554 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Serialization\JsonSerializerInternalReader.cs @ 163]
588977F0 50D39EDC Newtonsoft.Json.dll!Newtonsoft.Json.Serialization.JsonSerializerProxy.DeserializeInternal(Newtonsoft.Json.JsonReader, System.Type) + 104 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Serialization\JsonSerializerProxy.cs @ 257]
58897820 525B023C Newtonsoft.Json.dll!Newtonsoft.Json.JsonSerializer.Deserialize(Newtonsoft.Json.JsonReader, System.Type) + 80 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\JsonSerializer.cs @ 886]
58897848 50D39E08 Newtonsoft.Json.dll!Newtonsoft.Json.Linq.JToken.ToObject(System.Type, Newtonsoft.Json.JsonSerializer) + 140 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Linq\JToken.cs @ 2072]
58897880 50D3AFFE Newtonsoft.Json.dll!Newtonsoft.Json.Linq.JToken.ToObject[[System.__Canon, System.Private.CoreLib]](Newtonsoft.Json.JsonSerializer) + 142 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Linq\JToken.cs @ 2057]
588978C8 50CEA1F2 DataModel.dll!TemplatesAndArchitectureItems.ArchitectureItemTemplate.Deserialize(Newtonsoft.Json.Linq.JToken, Persistency.PersistencyContext, Newtonsoft.Json.JsonSerializer) + 498 [D:\src\\source\core\DataModel\TemplatesAndArchitectureItems\ArchitectureItemTemplate.cs @ 215]
58897978 50D3A240 Persistency.dll!Persistency.Converters.SerializationConverter`1[[System.__Canon, System.Private.CoreLib]].DoReadJson(Newtonsoft.Json.Linq.JToken, System.__Canon, Newtonsoft.Json.JsonSerializer) + 108
588979A8 50D381E4 Persistency.dll!Persistency.Converters.BaseConverter`1[[System.__Canon, System.Private.CoreLib]].ReadJson(Newtonsoft.Json.JsonReader, System.Type, System.Object, Newtonsoft.Json.JsonSerializer) + 924
58897AE8 50D37C04 Newtonsoft.Json.dll!Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(Newtonsoft.Json.JsonConverter, Newtonsoft.Json.JsonReader, System.Type, System.Object) + 388 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Serialization\JsonSerializerInternalReader.cs @ 2159]
58897BB0 525B173E Newtonsoft.Json.dll!Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(Newtonsoft.Json.JsonReader, System.Type, Boolean) + 554 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Serialization\JsonSerializerInternalReader.cs @ 163]
58897CB0 525B03BE Newtonsoft.Json.dll!Newtonsoft.Json.JsonSerializer.DeserializeInternal(Newtonsoft.Json.JsonReader, System.Type) + 350 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\JsonSerializer.cs @ 907]
58897D80 525B023C Newtonsoft.Json.dll!Newtonsoft.Json.JsonSerializer.Deserialize(Newtonsoft.Json.JsonReader, System.Type) + 80 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\JsonSerializer.cs @ 886]
58897DA8 50D39E08 Newtonsoft.Json.dll!Newtonsoft.Json.Linq.JToken.ToObject(System.Type, Newtonsoft.Json.JsonSerializer) + 140 [D:\src\Newtonsoft.Json\Src\Newtonsoft.Json\Linq\JToken.cs @ 2072]
58897DE0 50D45E4E Persistency.dll!Persistency.FileTools.Read(System.String, Newtonsoft.Json.JsonSerializer, System.Type, Persistency.PersistencyContext, Boolean) + 90
58897E10 50CE50A0 Persistency.dll!Persistency.FileTools.Read[[System.__Canon, System.Private.CoreLib]](System.String, Newtonsoft.Json.JsonSerializer, Persistency.PersistencyContext, Boolean) + 160
58897E68 50CE4E5A PersistencyService.dll!Persistency.Service.PersistencyService.LoadArchitectureItemTemplate(System.String, System.Guid, Persistency.PersistencyContext, Newtonsoft.Json.JsonSerializer) + 254 [D:\src\\source\core\Persistency\PersistencyEngine\PersistencyService.cs @ 1068]
58897EB8 50D175A2
58897EC0 50D175A2 PersistencyService.dll!Persistency.Service.PersistencyService.LoadProject(Project.ProjectEntity, System.String, Boolean) + 1286 [D:\src\\source\core\Persistency\PersistencyEngine\PersistencyService.cs @ 912]
58898108 50DACD88 LocalRestApi.dll!LocalRestApi.StartupWithPlugins.LoadProject(Microsoft.Extensions.Logging.ILogger, Gateway.BaseServices.IRuntimeEnvironment, Gateway.BaseServices.IProjectProvider, Persistency.Service.PersistencyService, Box.RuntimeFramework.BaseServices.IUploadService, Microsoft.AspNetCore.Hosting.IApplicationLifetime, Services.IErrorPageService) + 1028 [D:\src\\source\Box\LocalRestApi\Startup.cs @ 278]
588982B0 5491F5FA LocalRestApi.dll!LocalRestApi.StartupWithPlugins.Configure(Microsoft.AspNetCore.Builder.IApplicationBuilder, Microsoft.AspNetCore.Hosting.IHostingEnvironment) + 614 [D:\src\\source\Box\LocalRestApi\Startup.cs @ 178]
58898360 76622AD6 libcoreclr.so!CallDescrWorkerInternal + 54
58898530          [DebuggerU2MCatchHandlerFrame: 58898530]
588985C8          [HelperMethodFrame_PROTECTOBJ: 588985c8] System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean, Boolean)

Unfortunately, "dumpasync" still does not work, but perhaps i could print some threading info to the console (WebHost is being started in a worker thread if that is relevant)

There is one interesting thing about it - we found that if we proactively throw ArgumentNullException in the offending code application does not crash anymore, nevertheless we have no clue why it crashes if we don't.

        protected override void DoReadJson(JToken jsonObject, T resultValue, JsonSerializer serializer)
        {
            //if (resultValue == null) throw new ArgumentNullException(nameof(resultValue));
            resultValue.Deserialize(jsonObject, Context, serializer);
        }
janvorli commented 5 years ago

@gpomykala thank you for more details. In which of the frames shown in the stack trace above should the exception get caught?

gpomykala commented 5 years ago

@janvorli thanks for your reply, there is a try...catch clause in:

58898108 50DACD88 LocalRestApi.dll!LocalRestApi.StartupWithPlugins.LoadProject(Microsoft.Extensions.Logging.ILogger, Gateway.BaseServices.IRuntimeEnvironment, Gateway.BaseServices.IProjectProvider, Persistency.Service.PersistencyService, Box.RuntimeFramework.BaseServices.IUploadService, Microsoft.AspNetCore.Hosting.IApplicationLifetime, Services.IErrorPageService) + 1028 [D:\src\\source\Box\LocalRestApi\Startup.cs @ 278]

The catch section is never reached on ARM

gpomykala commented 5 years ago

@janvorli we could provide our app binaries via non-public channel ie. send you a onedrive link over email, then you could debug clr while running our app. Would that be okay?

janvorli commented 5 years ago

@gpomykala that would work. My email alias is the same as my alias on github (just add @microsoft.com)

gpomykala commented 5 years ago

Great, i sent you an email.

janvorli commented 5 years ago

I have found the culprit. The null reference exception happens in virtual dispatch stub, which is an asm code generated at runtime. Catching such exception works fine, since we have a special handling in HandleHardwareException for virtual dispatch stubs. However, when such an exception is rethrown or another exception is thrown from the catch handler, we need to unwind through the virtual dispatch stub frame and we fail since the generated code has no unwind info. We need to add special handling for virtual dispatch stubs to the UnwindManagedExceptionPass1.

gpomykala commented 5 years ago

Great to hear you found the cause. I'm happy I could help. I will monitor this thread until the fix is available. Thanks!

wt., 9.07.2019, 22:47 użytkownik Jan Vorlicek notifications@github.com napisał:

I have found the culprit. The null reference exception happens in virtual dispatch stub, which is an asm code generated at runtime. Catching such exception works fine, since we have a special handling in HandleHardwareException for virtual dispatch stubs. However, when such an exception is rethrown or another exception is thrown from the catch handler, we need to unwind through the virtual dispatch stub frame and we fail since the generated code has no unwind info. We need to add special handling for virtual dispatch stubs to the UnwindManagedExceptionPass1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/25123?email_source=notifications&email_token=ABL7QAFRNDMNAMB5YTYXHTTP6T2P5A5CNFSM4HXJFT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZRPESI#issuecomment-509801033, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL7QAC5YHR3GNWCVTKQL23P6T2P5ANCNFSM4HXJFT2A .