dotnet / android

.NET for Android provides open-source bindings of the Android SDK for use with .NET managed languages such as C#
MIT License
1.88k stars 521 forks source link

[.NET 8] Android crash at `xamarin::android::Helpers::abort_application` #8673

Open akravch opened 4 months ago

akravch commented 4 months ago

Android application type

.NET Android (net7.0-android, net8.0-android, etc.)

Affected platform version

.NET 8.0.101, Android workload 34.0.52, Android NDK 26.1.10909125

Description

We are getting a lot of native crashes in production that we don't have on Xamarin. Here is the top 1 crash stack trace:

libc.so                    0x8ced76c8 + 337608
libc.so                    0x8ced7698 + 337560
libmono-android.release.so xamarin::android::Helpers::abort_application()
libmonosgen-2.0.so         mono_debugger_agent_unhandled_exception
libmonosgen-2.0.so         mono_debugger_agent_unhandled_exception
libmonosgen-2.0.so         mono_debugger_agent_unhandled_exception
libmonosgen-2.0.so         mono_runtime_class_init_full
libmonosgen-2.0.so         mono_jit_set_domain
libmonosgen-2.0.so         mono_jit_set_domain
libmonosgen-2.0.so         mono_install_ftnptr_eh_callback
libmonosgen-2.0.so         mono_install_ftnptr_eh_callback

Unfortunately, we don't have any ADB logs or a repro and we only see it in production.

We publish our application in AOT and, obviously, release configuration, so I'm also a little concerned about the jit and debugger in the stack trace - not sure if it's expected or something went wrong with our build process too.

Can anyone provide their thought on this issue, possible workarounds, etc.?

Steps to Reproduce

No repro, only getting crash reports from production.

Did you find any workaround?

No response

Relevant log output

No response

grendello commented 4 months ago

@akravch it's not a crash, but a deliberate termination of the application in reaction to some unrecoverable error. The debugger should definitely not be part of the trace in a release app. JIT in the stack trace is not unusual, in and of itself, but this trace appears to point to a problem with initialization of some managed class. The unhandled exception may be thrown from a static constructor (either user provided or generated by the compiler). Alas, without at least the name of the class that fails to initialize and the exception that's actually thrown, we can't do much about it. Can you provide information about what devices and Android versions the crashes occur? Is there a pattern to them?

I don't see any direct calls to mono_debugger_agent_unhandled_exception from mono_runtime_class_init_full

@lambdageek any idea how mono_runtime_class_init_full can invoke debugger exception handler in Release builds?

akravch commented 4 months ago

Thanks for the quick reply.

Sure, here is a random set of the devices/OSs from this crash group:

Name OS
NAM-LX9 Android 12
Reno2 Z Android 11
T1 Android 13
CPH1909 Android 8.1.0
Redmi Note 11 SE Android 13
OnePlus 10 Pro 5G Android 14
HUAWEI nova 2 lite Android 8.0.0
F19 Pro Android 13
Galaxy S23 Android 14
Redmi Note 6 Pro Android 9
realme narzo 30 5G Android 13
Redmi Note 9T Android 12
HUAWEI nova 2 lite Android 8.0.0
AIR 55 Android 8.1.0
grendello commented 4 months ago

@akravch do you have the AndroidEnableMarshalMethods MSBuild property set to true?

akravch commented 4 months ago

@akravch do you have the AndroidEnableMarshalMethods MSBuild property set to true?

@grendello no, it's not set explicitly to anything. I have just checked the detailed build logs and I see that it is not getting set to true implicitly too, it remains false.

grendello commented 4 months ago

@akravch thanks! Are you able to get the application crash tombstones from Google? If you can, they may contain fragments of logcat output which is what we need to even start looking for the cause :( If the tombstones aren't available, can you see if any of the crashes contain the actual abort signal message (look for SIGABRT in whatever output you have), if we're lucky it will contain the actual abort message we log)

The traces you get should have addresses in them, would you be able to pick one that has the longest trace and paste here? To accompany it, I'd need you to extract the libmonosgen-2.0.so file, zip it up and attach here - if we have addresses, we might be able to at least figure out the code where the exception is thrown and handled.

Additionally, there's something weird in your trace, namely the libmono-android.release.so entry. This is our native runtime, but we do not package it with that name, we put it in the apk/aab archive as libmonodroid.so, would you be able to attach a listing of your apk/aab lib/{ARCHITECTURE}/ entries?

akravch commented 4 months ago

@grendello we were able to get a very limited logcat, and from the tombstone I see that the signal is signal 6 (SIGABRT), code -6 (SI_TKILL). Here is the tombstone stace:

Fatal signal 6 (SIGABRT), code -6 (SI_TKILL) in tid 17664 (.NET TP Worker), pid 28165 (<app-process-name>o)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Redmi/curtana_global/curtana:12/SKQ1.211019.001/V14.0.3.0.SJWMIXM:user/release-keys'
Revision: '0'
ABI: 'arm64'
Timestamp: 2024-01-26 15:46:12.716852632+0500
Process uptime: 0s
Cmdline: <package-name>
pid: 28165, tid: 17664, name: .NET TP Worker  >>> <package-name> <<<
uid: 10425
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
    x0  0000000000000000  x1  0000000000004500  x2  0000000000000006  x3  000000774c21e430
    x4  0000000000000000  x5  0000000000000000  x6  0000000000000000  x7  0000000013bcfa72
    x8  00000000000000f0  x9  fe8da60299a4bcee  x10 0000000000000000  x11 ffffff80fffffbdf
    x12 0000000000000001  x13 000000000000005d  x14 000000774c21d2e0  x15 0000000000000000
    x16 000000785bb8ad30  x17 000000785bb64930  x18 00000076f79f2000  x19 0000000000006e05
    x20 0000000000004500  x21 00000000ffffffff  x22 00000077447bf000  x23 b4000076b656f650
    x24 00000077447bf980  x25 000000774c220cb0  x26 0000000000000000  x27 0000000000000000
    x28 b4000076b656f640  x29 000000774c21e4b0
    lr  000000785bb14cdc  sp  000000774c21e410  pc  000000785bb14d08  pst 0000000000001000
backtrace:
      #00 pc 000000000008bd08  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168) (BuildId: 2f84be24a19d511a358a9050af5b3974)
      #01 pc 000000000001f360  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmono-android.release.so (xamarin::android::Helpers::abort_application()+8) (BuildId: 374dc2e9ba70e7c5c6827626c1ea1ad3c2a64124)
      #02 pc 0000000000035660  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmono-android.release.so (xamarin::android::internal::MonodroidRuntime::mono_log_handler(char const*, char const*, char const*, int, void*)+144) (BuildId: 374dc2e9ba70e7c5c6827626c1ea1ad3c2a64124)
      #03 pc 00000000001d7064  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #04 pc 00000000001d7190  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #05 pc 00000000001d71d4  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #06 pc 000000000025efa4  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (mono_runtime_class_init_full+2232) (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #07 pc 00000000000bdc3c  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #08 pc 00000000000c24b0  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #09 pc 00000000000c191c  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #10 pc 0000000000152088  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #11 pc 0000000000151bec  /data/app/~~7kJMqWKCr_j8wCTwIJFo1g==/<package-name>-9bNmZtBMLC7EDdKLqPkQDw==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (BuildId: 4ff0195244ee97a6b2da5b8881773c3c424a9739)
      #12 pc 0000000000004300  <anonymous:785d707000>

There is also no libmono-android.release.so in our entire .aab, but libmonodroid.so is present.

Attaching a zip with the .json representation of a crash minidump with the addresses, a piece of logcat, the list of the .so libs in the app (excluded most of the company-related libs though) and the libmonosgen-2.0.so: abort_application.zip

grendello commented 4 months ago

I've just realized why we see libmono-android.release.so name in the traces despite having libmonodroid.so in the archive. Android uses the shared library SONAME field in the trace, instead of the file name (as it was previously) and that creates the inconsistency, since the SONAME is used to construct what appears to be a file path. So at least that part was a red herring. However, we do have our error:

01-26 15:46:10.860 28165 17919 E <app-process-name>: * Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition `lock->done' not met
01-26 15:46:10.860 28165 17664 E <app-process-name>: * Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition `lock->done' not met

The assertion ends up logged by us, then, since its a fatal error, the app aborts by calling the abort(2) function.

Looks like it's a MonoVM runtime issue. @lambdageek would you mind taking a look, or delegating to whomever could look into it? Thanks!

grendello commented 4 months ago

@akravch is there any output tagged with monodroid or DOTNET above the first line in your logcat?

akravch commented 4 months ago

@akravch is there any output tagged with monodroid or DOTNET above the first line in your logcat?

No, unfortunately, not a single line.

Are these logs enabled by default in release builds though? I'm not really familiar with how to configure verbosity for this kind of logs, but if someone here knows a way to make sure they will remain enabled, we could try it in our next release.

lambdageek commented 4 months ago
01-26 15:46:10.860 28165 17919 E <app-process-name>: * Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition `lock->done' not met
01-26 15:46:10.860 28165 17664 E <app-process-name>: * Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition `lock->done' not met

This is a known regression in .NET SDK 8.0.1. See https://github.com/dotnet/runtime/issues/96804 it will be fixed in .NET SDK 8.0.2 (coming in February).

As a temporary workaround you can force the SDK to use the runtime from 8.0.0 - see https://github.com/dotnet/runtime/issues/96804#issuecomment-1897432561 for instructions.

akravch commented 4 months ago

@lambdageek thanks for the idea. Although, we did try to release our app on 8.0.100 a few weeks back, and we still had these abort_application crashes too, it just wasn't top 1. Maybe the failed assertion is not (always?) a root cause.

I think we'll try to release on 8.0.100 once again and see if we'll be able to get more logs.

lambdageek commented 4 months ago

@lambdageek thanks for the idea. Although, we did try to release our app on 8.0.100 a few weeks back, and we still had these abort_application crashes too, it just wasn't top 1. Maybe the failed assertion is not (always?) a root cause.

I think we'll try to release on 8.0.100 once again and see if we'll be able to get more logs.

note that once 8.0.101 was released, installing a .NET workload will pick up the 8.0.1 runtime pack; just changing things in global.json to pin the SDK isn't enough. you need the .csproj Target to override the runtime pack explicitly. .NET workloads are non-intuitive

<Target Name="UpdateRuntimePackVersion" BeforeTargets="ProcessFrameworkReferences">
  <ItemGroup>
    <KnownRuntimePack Condition="%(RuntimePackLabels) == 'Mono'" LatestRuntimeFrameworkVersion="8.0.0" />
  </ItemGroup>
</Target>
akravch commented 3 months ago

Still seeing this on 8.0.200, although much less and with a slightly different stacktrace:

libc.so                     0xf5b32e14 + 122388
libc.so                     0xf5b32df4 + 122356
libmono-android.release.so  xamarin::android::Helpers::abort_application()
libmonosgen-2.0.so          mono_debugger_agent_unhandled_exception
libmonosgen-2.0.so          mono_debugger_agent_unhandled_exception
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_gc_make_root_descr_all_refs
libmonosgen-2.0.so          mono_gc_make_root_descr_all_refs
libmonosgen-2.0.so          mono_gc_wbarrier_generic_store_internal
libmonosgen-2.0.so          mono_sgen_mono_ilgen_init
libmonosgen-2.0.so          mono_gc_deregister_root
dmariogatto commented 2 days ago

Also experiencing a similar issue, might be related to https://github.com/dotnet/maui/issues/22812.

Using latest Android 34.0.113, was also occurring on previous version.

_Fatal signal 6 (SIGABRT), code -1 (SIQUEUE) in tid 3202 (.NET TP Worker), pid 2403 (o.adelaidemetro) locat.txt

grendello commented 1 day ago

@dmariogatto in your case this appears to be the cause:

06-16 08:52:04.896  2403  3214 W         : mono_class_from_mono_type_internal: implement me 0x00
06-16 08:52:04.896  2403  3214 E         : * Assertion: should not be reached at /__w/1/s/src/mono/mono/metadata/class.c:2244

Are you running a Debug build on device by chance?

dmariogatto commented 22 hours ago

@grendello Interesting... it's a release build deployed from Visual Studio. Getting Fatal signal 6 (SIGABRT) from GooglePlay builds and was trying to add some extra logging locally.

grendello commented 21 hours ago

@dmariogatto it might be that the trimmer removes something it shouldn't have (it's just a wild guess). The message above comes from this code, but it doesn't mean Mono encountered a, well, Type type it doesn't know, but that the Type isn't specified (the 0x00 in your log message). If you can repro this locally reliably, would you be able to try without the trimmer?

Also, to record the most useful log for us, please issue these commands from the VS dev prompt before you repro the crash:

> adb shell setprop debug.mono.log default,assembly,mono_log_level=debug,mono_log_mask=all
> adb logcat -G 64M
> adb logcat -c
rem Cause the app to crash here and after it does so, wait 2-3 seconds and:
> adb logcat -d > logcat.txt