dotnet / android

.NET for Android provides open-source bindings of the Android SDK for use with .NET managed languages such as C#
MIT License
1.93k stars 531 forks source link

Native crash, sgen gc aborting #6546

Closed tmijieux closed 2 years ago

tmijieux commented 2 years ago

Android application type

Classic Xamarin.Android (MonoAndroid11.0, MonoAndroid12.0, etc.)

Affected platform version

VS2022 17.0.1, 17.0.2 VS2019(latest, probably https://docs.microsoft.com/fr-fr/visualstudio/releases/2019/release-notes#16.11.5 but i uninstalled it since)

Description

I am currently under investigation of a native crash very similar to https://github.com/xamarin/xamarin-android/issues/3892

the relevant part of the log seems to be this message Assertion: should not be reached at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/release/mono/sgen/sgen-scan-object.h:91 (see here this is actually a header file that contains preprocessor templated code and is included in multiple position in the code) that seems to indicate some portion of theorically unreachable code was reached in the code that is scanning for references in the sgen garbage collector, specifically it looked for an field in a native struct to determine what type of object it is currently looking at, but the switch did not match any case which seems to indicate what it is looking at is not what it expected to be (this could be maybe memory corruption?)

My intuition for now is that is has something to do with some c# code we introduced (tough i am not 100% sure about that and my rational self rather believe that it is unlikely since it looks like a native crash). It started crashing in a production release (we never reproduced during development until it happened in production). In the new release, we did update some nugets but even after reverting the nuget versions, the crash was still there. But when we rebuilt (with the same visual-studio/xamarin-android version that was producing the crashing builds) an old version that we knew was not triggering this crash initially, then the new build of that old version was still not crashing.

At first we did thought the crash was more likely to happen under low memory condition (and it is likely because it happens when gc is running) and we looked for memory leaks. We found some but even after fixing most of them the crashes are still there.

At the times when we were looking for leaks, we tried to do a "git bisect" to find where the problem was coming from but we were more focused on the leaks when we did this, so I think I should probably retry to do a "git bisect" focusing on trying to trigger the crash (but this is a little problematic because we did not found a way yet to reproduce this issue systematically on any of our test devices)

Things i've looked at that I thought could have been related but does not really look similar: https://docs.microsoft.com/en-us/xamarin/android/release-notes/11/11.1#corrected-garbage-collection-behavior-for-android-bindings-and-bindings-projects

Our app is a xamarin.forms app there is the list of nugets we use

xml package references

```xml all runtime; build; native; contentfiles; analyzers; buildtransitive ./OpenId.AppAuth.Android.dll ```

So we have a few bindings libraries (com.onesignal, openid.appauth,...) and the main native library here is SkiaSharp.

What i am currently stuck at: I succeed at binding gdb on my app like describe here (https://github.com/xamarin/xamarin-android/blob/main/Documentation/workflow/DevelopmentTips.md#attaching-gdb-using-visual-studio-on-windows) and i think i also got my app to trigger my crash once, but there was virtually no information when printing the backtrace (just interrogation mark)

My current goal is to build xamarin-android to get to natively debug my app and try to get more information out of it (with debug symbols in mono sgen and stuff, i would like to have Address and undefined behavior sanitizer on mono and skiasharp if possible )

I successfully built xamarin-android and xabuild (I checkout the d17-0 branch because i wanted to get the same version i have in my current visual studio 2022, is that a good idea or not?) I had to change to platformtarget of xabuild.csproj to x64 because the msbuild in vs2022 seems to be 64bits. if i run xabuild on the samples/HelloWorld project i can build and deploy an app but if I add a packagereference to xamarin.forms

diff

```diff diff --git a/samples/HelloWorld/HelloWorld.csproj b/samples/HelloWorld/HelloWorld.csproj index 2b5391ee..092a8090 100644 --- a/samples/HelloWorld/HelloWorld.csproj +++ b/samples/HelloWorld/HelloWorld.csproj @@ -52,6 +52,7 @@ {6BE66B30-9346-4DA6-B09A-0CDC1DFE33C2} HelloLibrary + @@ -76,4 +77,4 @@ - \ No newline at end of file + diff --git a/samples/HelloWorld/MainActivity.cs b/samples/HelloWorld/MainActivity.cs index 43d1421e..3549205d 100644 --- a/samples/HelloWorld/MainActivity.cs +++ b/samples/HelloWorld/MainActivity.cs @@ -1,6 +1,8 @@ -using Android.App; +using Android.App; using Android.Widget; using Android.OS; +using Xamarin.Forms.Platform.Android; +using Xamarin.Forms; namespace HelloWorld { @@ -9,24 +11,22 @@ namespace HelloWorld Label = "HelloWorld", MainLauncher = true, Name = "example.MainActivity")] - public class MainActivity : Activity + public class MainActivity : FormsAppCompatActivity { int count = 1; protected override void OnCreate (Bundle savedInstanceState) { - base.OnCreate (savedInstanceState); - + base.OnCreate(savedInstanceState); // Set our view from the "main" layout resource - SetContentView (Resource.Layout.Main); - + // SetContentView (Resource.Layout.Main); // Get our button from the layout resource, // and attach an event to it - Button button = FindViewById

Somehow the referenced assemblies from the nuget does not get added to the csc command line and the project fails to build (C:\src\xamarin-android\samples\HelloWorld\MainActivity.cs(4,15): error CS0234: The type or namespace name 'Forms' does not exist in the namespace 'Xamari n' (are you missing an assembly reference?) [C:\src\xamarin-android\samples\HelloWorld\HelloWorld.csproj]) just like what would happen if the packagereference was not there, The same issue happens with my own project so i am currently unable to build my project with xabuild. (maybe I have something in my env that is hindering correct behavior or there is an issue with xabuild itself? if someone have an idea about this issue that would be helpful, i attached a binlog for the modified HelloWorld) msbuild.binlog.zip

for items mentionned in the referenced issue: I tried disabling the concurrent garbage collector, but the app suffered a slowdown, and it did not seems to fix the crashes. The crash seems to happens even on debug builds, not only on appstore releases, but maybe less often... What I did not try yet:

Steps to Reproduce

It is still very hard even for our team to reproduce (especially on emulator where it seems to happens very rarely)

Did you find any workaround?

Not yet, reverting my app to an old version did the trick for now but we cannot go forward until we find what is causing this.

Relevant log output

2021-11-29 23:14:17.752 15001-15458/? I/com.my.app: Explicit concurrent copying GC freed 1547(553KB) AllocSpace objects, 0(0B) LOS objects, 49% free, 5348KB/10MB, paused 434us total 33.966ms
2021-11-29 23:14:17.808 15001-15458/? V/mono-stdout: [23:14:17 DBG] Filtering history len(newOps)=50 _uid=2
2021-11-29 23:14:17.818 15001-15001/? I/Choreographer: Skipped 55 frames!  The application may be doing too much work on its main thread.
2021-11-29 23:14:17.894 15001-15458/? I/com.my.app: Explicit concurrent copying GC freed 745(274KB) AllocSpace objects, 0(0B) LOS objects, 49% free, 5409KB/10MB, paused 518us total 31.195ms
2021-11-29 23:14:18.055 15001-15458/? I/com.my.app: Explicit concurrent copying GC freed 1187(463KB) AllocSpace objects, 0(0B) LOS objects, 49% free, 5458KB/10MB, paused 710us total 44.776ms
2021-11-29 23:14:18.094 15001-15460/? D/HostConnection: HostConnection::get() New Host Connection established 0x7a63baf5acd0, tid 15460
2021-11-29 23:14:18.110 15001-15460/? D/HostConnection: HostComposition ext ANDROID_EMU_CHECKSUM_HELPER_v1 ANDROID_EMU_native_sync_v2 ANDROID_EMU_native_sync_v3 ANDROID_EMU_native_sync_v4 ANDROID_EMU_dma_v1 ANDROID_EMU_direct_mem ANDROID_EMU_host_composition_v1 ANDROID_EMU_host_composition_v2 ANDROID_EMU_vulkan ANDROID_EMU_deferred_vulkan_commands ANDROID_EMU_vulkan_null_optional_strings ANDROID_EMU_vulkan_create_resources_with_requirements ANDROID_EMU_YUV_Cache ANDROID_EMU_async_unmap_buffer ANDROID_EMU_vulkan_ignored_handles ANDROID_EMU_has_shared_slots_host_memory_allocator ANDROID_EMU_vulkan_free_memory_sync ANDROID_EMU_vulkan_shader_float16_int8 ANDROID_EMU_vulkan_async_queue_submit ANDROID_EMU_sync_buffer_data ANDROID_EMU_read_color_buffer_dma GL_OES_vertex_array_object GL_KHR_texture_compression_astc_ldr ANDROID_EMU_host_side_tracing ANDROID_EMU_async_frame_commands ANDROID_EMU_gles_max_version_2 
2021-11-29 23:14:18.126 15001-15460/? D/EGL_emulation: eglCreateContext: 0x7a63aafc3520: maj 2 min 0 rcv 2
2021-11-29 23:14:18.128 308-25322/? D/goldfish-address-space: claimShared: Ask to claim region [0x3f2377000 0x3f29a6000]
2021-11-29 23:14:18.161 15001-15460/? D/EGL_emulation: eglMakeCurrent: 0x7a63aafc3520: ver 2 0 (tinfo 0x7a634c61f730) (first time)
2021-11-29 23:14:18.173 15001-15001/? V/mono-stdout: [23:14:18 DBG] layer=Layer tl?=Layer
2021-11-29 23:14:18.176 15001-15059/? I/OpenGLRenderer: Davey! duration=1282ms; Flags=0, IntendedVsync=28726173306561, Vsync=28727089973191, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=28727099426100, AnimationStart=28727099444200, PerformTraversalsStart=28727099669600, DrawStart=28727370091500, SyncQueued=28727415394000, SyncStart=28727417344200, IssueDrawCommandsStart=28727442088600, SwapBuffers=28727447245300, FrameCompleted=28727457481200, DequeueBufferDuration=6783100, QueueBufferDuration=2974400, GpuCompleted=28703857726100, 
2021-11-29 23:14:18.210 308-25322/? D/goldfish-address-space: claimShared: Ask to claim region [0x3f116e000 0x3f179d000]
2021-11-29 23:14:18.351 15001-15458/? I/com.my.app: Explicit concurrent copying GC freed 1476(668KB) AllocSpace objects, 2(264KB) LOS objects, 49% free, 5477KB/10MB, paused 596us total 32.235ms
2021-11-29 23:14:18.371 8384-8488/? I/WorkerManager: dispose()
2021-11-29 23:14:18.372 8384-8488/? W/A: Queue length for executor EventBus is now 11. Perhaps some tasks are too long, or the pool is too small.
2021-11-29 23:14:18.505 15001-15113/? I/com.my.app: Explicit concurrent copying GC freed 261(414KB) AllocSpace objects, 1(132KB) LOS objects, 49% free, 5478KB/10MB, paused 293us total 36.289ms
2021-11-29 23:14:18.557 15001-15458/? V/mono-stdout: [23:14:18 DBG] Filtering history len(newOps)=50 _uid=2
2021-11-29 23:14:18.573 15001-15001/? I/Choreographer: Skipped 44 frames!  The application may be doing too much work on its main thread.
2021-11-29 23:14:18.608 15001-15001/? V/mono-stdout: [23:14:18 DBG] layer=Layer tl?=Layer
2021-11-29 23:14:18.613 15001-15059/? I/OpenGLRenderer: Davey! duration=786ms; Flags=0, IntendedVsync=28727106635183, Vsync=28727839968487, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=28727854685100, AnimationStart=28727854702800, PerformTraversalsStart=28727854875600, DrawStart=28727864823500, SyncQueued=28727866683200, SyncStart=28727867353800, IssueDrawCommandsStart=28727882846100, SwapBuffers=28727893382800, FrameCompleted=28727894188100, DequeueBufferDuration=91800, QueueBufferDuration=377500, GpuCompleted=28703874322700, 
2021-11-29 23:14:18.761 15001-15162/? I/com.my.app: Explicit concurrent copying GC freed 1485(986KB) AllocSpace objects, 1(132KB) LOS objects, 49% free, 5636KB/11MB, paused 404us total 56.260ms
2021-11-29 23:14:18.896 15001-15001/? I/com.my.app: Explicit concurrent copying GC freed 287(376KB) AllocSpace objects, 2(104KB) LOS objects, 49% free, 5460KB/10MB, paused 333us total 29.737ms
2021-11-29 23:14:18.946 15001-15032/? E/com.my.app: * Assertion: should not be reached at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/release/mono/sgen/sgen-scan-object.h:91
2021-11-29 23:14:18.949 15001-15032/? A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 15032 (SGen worker), pid 15001 (com.my.appy)
2021-11-29 23:14:19.039 15506-15506/? I/crash_dump64: obtaining output fd from tombstoned, type: kDebuggerdTombstone
2021-11-29 23:14:19.045 279-279/? I/tombstoned: received crash request for pid 15032
2021-11-29 23:14:19.048 15506-15506/? I/crash_dump64: performing dump of process 15001 (target tid = 15032)
2021-11-29 23:14:19.056 15506-15506/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: Build fingerprint: 'google/sdk_gphone_x86_64/generic_x86_64_arm64:11/RSR1.201211.001/7027799:user/release-keys'
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: Revision: '0'
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: ABI: 'x86_64'
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: Timestamp: 2021-11-29 23:14:19+0100
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: pid: 15001, tid: 15032, name: SGen worker  >>> com.my.app <<<
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: uid: 10174
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG: signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG:     rax 0000000000000000  rbx 0000000000003a99  rcx 00007a65be52a2a8  rdx 0000000000000006
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG:     r8  00007a62d0ffb510  r9  00007a62d0ffb510  r10 00007a62d0ffb4c0  r11 0000000000000246
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG:     r12 00007a62c6aa3600  r13 00007a62c6aa3510  r14 00007a62d0ffb4b8  r15 0000000000003ab8
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG:     rdi 0000000000003a99  rsi 0000000000003ab8
2021-11-29 23:14:19.057 15506-15506/? A/DEBUG:     rbp 00007a62d1baefc4  rsp 00007a62d0ffb4a8  rip 00007a65be52a2a8
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG: backtrace:
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #00 pc 000000000005a2a8  /apex/com.android.runtime/lib64/bionic/libc.so (syscall+24) (BuildId: 3707c39fc397eeaa328142d90b50a973)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #01 pc 000000000005d212  /apex/com.android.runtime/lib64/bionic/libc.so (abort+194) (BuildId: 3707c39fc397eeaa328142d90b50a973)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #02 pc 0000000000029c69  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonodroid.so (xamarin::android::internal::MonodroidRuntime::mono_log_handler(char const*, char const*, char const*, int, void*)+105) (BuildId: f6a65f881901ce9723a8e780b7564e14a7d48dcc)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #03 pc 00000000002bd9f1  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (monoeg_g_logv_nofree+177)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #04 pc 00000000002bdb64  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (monoeg_assertion_message+148)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #05 pc 00000000002bdbc8  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (mono_assertion_message_unreachable+24)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #06 pc 0000000000279d51  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (major_scan_object_concurrent_with_evacuation+6897)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #07 pc 000000000028adc7  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (scan_card_table_for_block+775)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #08 pc 0000000000271980  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (major_scan_card_table+432)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #09 pc 000000000026b469  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (job_major_mod_union_preclean+137)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #10 pc 000000000029a5cf  /data/app/~~r1r3_baG9uAHYBKyOLy_GA==/com.my.app-VfhbsOErghfDt2oxW1LXgw==/lib/x86_64/libmonosgen-2.0.so (thread_func+591)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #11 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58) (BuildId: 3707c39fc397eeaa328142d90b50a973)
2021-11-29 23:14:19.066 15506-15506/? A/DEBUG:       #12 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55) (BuildId: 3707c39fc397eeaa328142d90b50a973)
2021-11-29 23:14:19.759 15001-15463/? I/com.my.app: Explicit concurrent copying GC freed 61(105KB) AllocSpace objects, 0(0B) LOS objects, 49% free, 5467KB/10MB, paused 386us total 360.550ms
2021-11-29 23:14:20.373 279-279/? E/tombstoned: Tombstone written to: /data/tombstones/tombstone_02
2021-11-29 23:14:20.382 527-564/? I/BootReceiver: Copying /data/tombstones/tombstone_02 to DropBox (SYSTEM_TOMBSTONE)
2021-11-29 23:14:20.383 527-564/? I/DropBoxManagerService: add tag=SYSTEM_TOMBSTONE isTagEnabled=true flags=0x2
2021-11-29 23:14:20.394 527-15518/? I/DropBoxManagerService: add tag=data_app_native_crash isTagEnabled=true flags=0x2
2021-11-29 23:14:20.403 527-15517/? W/ActivityTaskManager:   Force finishing activity com.my.app/.MainActivity
development environment information ``` Microsoft Visual Studio Community 2022 Version 17.0.2 VisualStudio.17.Release/17.0.2+31919.166 Microsoft .NET Framework Version 4.8.04161 Installed Version: Community Visual C++ 2022 00482-90000-00000-AA768 Microsoft Visual C++ 2022 ASP.NET and Web Tools 2019 17.0.793.11735 ASP.NET and Web Tools 2019 Azure App Service Tools v3.0.0 17.0.793.11735 Azure App Service Tools v3.0.0 C# Tools 4.0.1-1.21568.1+6ab6601178d9fba8c680b56934cd1742e0816bff C# components used in the IDE. Depending on your project type and settings, a different version of the compiler may be used. Common Azure Tools 1.10 Provides common services for use by Azure Mobile Services and Microsoft Azure Tools. Extensibility Message Bus 1.2.6 (master@34d6af2) Provides common messaging-based MEF services for loosely coupled Visual Studio extension components communication and integration. Microsoft JVM Debugger 1.0 Provides support for connecting the Visual Studio debugger to JDWP compatible Java Virtual Machines Microsoft MI-Based Debugger 1.0 Provides support for connecting Visual Studio to MI compatible debuggers Microsoft Visual C++ Wizards 1.0 Microsoft Visual C++ Wizards Microsoft Visual Studio VC Package 1.0 Microsoft Visual Studio VC Package Mono Debugging for Visual Studio 17.0.11 (54f19d2) Support for debugging Mono processes with Visual Studio. NuGet Package Manager 6.0.1 NuGet Package Manager in Visual Studio. For more information about NuGet, visit https://docs.nuget.org/ ProjectServicesPackage Extension 1.0 ProjectServicesPackage Visual Studio Extension Detailed Info Test Adapter for Boost.Test 1.0 Enables Visual Studio's testing tools with unit tests written for Boost.Test. The use terms and Third Party Notices are available in the extension installation directory. Test Adapter for Google Test 1.0 Enables Visual Studio's testing tools with unit tests written for Google Test. The use terms and Third Party Notices are available in the extension installation directory. TypeScript Tools 17.0.1001.2002 TypeScript Tools for Microsoft Visual Studio Visual Basic Tools 4.0.1-1.21568.1+6ab6601178d9fba8c680b56934cd1742e0816bff Visual Basic components used in the IDE. Depending on your project type and settings, a different version of the compiler may be used. Visual C++ for Cross Platform Mobile Development (Android) 17.0.31822.380 Visual C++ for Cross Platform Mobile Development (Android) Visual F# Tools 17.0.0-beta.21522.2+6d626ff0752a77d339f609b4d361787dc9ca93a5 Microsoft Visual F# Tools Visual Studio Code Debug Adapter Host Package 1.0 Interop layer for hosting Visual Studio Code debug adapters in Visual Studio Visual Studio IntelliCode 2.2 AI-assisted development for Visual Studio. Visual Studio Tools for CMake 1.0 Visual Studio Tools for CMake VisualStudio.DeviceLog 1.0 Information about my package VisualStudio.Foo 1.0 Information about my package VisualStudio.Mac 1.0 Mac Extension for Visual Studio Xamarin 17.0.0.341 (d17-0@ac52790) Visual Studio extension to enable development for Xamarin.iOS and Xamarin.Android. Xamarin Designer 17.0.0.182 (remotes/origin/d17-0@ea204898d) Visual Studio extension to enable Xamarin Designer tools in Visual Studio. Xamarin Templates 17.0.17 (9e779b0) Templates for building iOS, Android, and Windows apps with Xamarin and Xamarin.Forms. Xamarin.Android SDK 12.1.0.5 (d17-0/6b0e6b2) Xamarin.Android Reference Assemblies and MSBuild support. Mono: c633fe9 Java.Interop: xamarin/java.interop/d17-0@febb1367 ProGuard: Guardsquare/proguard/v7.0.1@912d149 SQLite: xamarin/sqlite/3.36.0@a575761 Xamarin.Android Tools: xamarin/xamarin-android-tools/d17-0@a5194e9 Xamarin.iOS and Xamarin.Mac SDK 15.2.0.17 (738fde344) Xamarin.iOS and Xamarin.Mac Reference Assemblies and MSBuild support. ```
tmijieux commented 2 years ago

I think i found a workaround for the place i was stuck for using xabuild, for a reason that i did not understand yet, NuGetTargets has its value set, as if VisualStudioVersion was 15.0 and the import is skipped, although VisualStudioVersion seems to be 17.0 in everywhere i look at it in my binlog.

(this is file from visual studio, imported through the Current symlink)

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>
    <NuGetTargets Condition="'$(NuGetTargets)'==''">$(MSBuildExtensionsPath)\Microsoft\NuGet\$(VisualStudioVersion)\Microsoft.NuGet.targets</NuGetTargets>
  </PropertyGroup>
  <Import Condition="Exists('$(NuGetTargets)') and '$(SkipImportNuGetBuildTargets)' != 'true'" Project="$(NuGetTargets)" />
</Project>

if i set NuGetTargets to the right value before any import , the packagereference now works correctly

tmijieux commented 2 years ago

I managed to get the application to crash in the debugger but as i did not yet compile my own mono, so I dont have the source or enough detailed debugging information stored in the shared objects. I cannot print any values or go up or down the stack yet 😢 . We still have maybe a little more information than the previous stack dump though...

-exec bt
#0  0x00007c72ab4f02a8 in syscall () from C:\Users\Thomas\AppData\Local\Temp\x64\Debug\.gdb\libc.so
#1  0x00007c72ab4f3213 in abort () from C:\Users\Thomas\AppData\Local\Temp\x64\Debug\.gdb\libc.so
#2  0x00007c6fbf23174a in ?? ()
#3  0x00007c6fbf1d9cf0 in ?? ()
#4  0x00007c6fbfb0f154 in eglib_log_adapter (log_domain=0x0, log_level=G_LOG_LEVEL_ERROR, message=0x7c70c92a0e90 "* Assertion: should not be reached at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-scan-object.h:91\n", user_data=0x0) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/utils/mono-logger.c:405
#5  0x00007c6fbfb3c72a in monoeg_g_logstr (log_domain=0x0, log_level=G_LOG_LEVEL_ERROR, msg=0x7c70c92a0e90 "* Assertion: should not be reached at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-scan-object.h:91\n") at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/eglib/goutput.c:151
#6  0x00007c6fbfb3c00e in monoeg_g_logv_nofree (log_domain=0x0, log_level=G_LOG_LEVEL_ERROR, format=0x7c6fbfb96dca "* Assertion: should not be reached at %s:%d\n", args=0x7c6fbf1d9020) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/eglib/goutput.c:166
#7  0x00007c6fbfb3c2f4 in monoeg_assertion_message (format=0x7c6fbfb96dca "* Assertion: should not be reached at %s:%d\n") at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/eglib/goutput.c:201
#8  0x00007c6fbfb3c394 in mono_assertion_message_unreachable (file=0x7c6fbfb8bb6a "/Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-scan-object.h", line=91) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/eglib/goutput.c:228
#9  0x00007c6fbfaa0c77 in major_scan_object_concurrent_with_evacuation (full_object=0x7c6fb7a157e0, desc=35027181184716800, queue=0x7c72aac6b010) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-scan-object.h:91
#10 0x00007c6fbfac3dd1 in scan_card_table_for_block (block=0x7c6fb7a14000, scan_type=CARDTABLE_SCAN_MOD_UNION_PRECLEAN, ctx=...) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-marksweep.c:2619
#11 0x00007c6fbfa91e24 in major_scan_card_table (scan_type=CARDTABLE_SCAN_MOD_UNION_PRECLEAN, ctx=..., job_index=0, job_split_count=1, block_count=4973) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-marksweep.c:2711
#12 0x00007c6fbfa877d2 in job_major_mod_union_preclean (worker_data_untyped=0x7c72aac6b008, job=0x7c7001dc3208) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-gc.c:1554
#13 0x00007c6fbfb00947 in thread_func (data=0x0) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-thread-pool.c:207
#14 0x00007c72ab55dd2b in __pthread_start(void*) () from C:\Users\Thomas\AppData\Local\Temp\x64\Debug\.gdb\libc.so
#15 0x00007c72ab4f50c8 in __start_thread () from C:\Users\Thomas\AppData\Local\Temp\x64\Debug\.gdb\libc.so
-exec up
#14 0x00007c72ab55dd2b in __pthread_start(void*) () from C:\Users\Thomas\AppData\Local\Temp\x64\Debug\.gdb\libc.so
=thread-selected,id="22",frame={level="14",addr="0x00007c72ab55dd2b",func="__pthread_start(void*)",args=[],from="C:\\Users\\Thomas\\AppData\\Local\\Temp\\x64\\Debug\\.gdb\\libc.so",arch="i386:x86-64"}

-exec down
#12 0x00007c6fbfa877d2 in job_major_mod_union_preclean (worker_data_untyped=0x7c72aac6b008, job=0x7c7001dc3208) at /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-gc.c:1554
1554    in /Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-gc.c
=thread-selected,id="22",frame={level="12",addr="0x00007c6fbfa877d2",func="job_major_mod_union_preclean",args=[{name="worker_data_untyped",value="0x7c72aac6b008"},{name="job",value="0x7c7001dc3208"}],file="/Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-gc.c",fullname="/Users/builder/jenkins/workspace/archive-mono/2020-02/android/debug/mono/sgen/sgen-gc.c",line="1554",arch="i386:x86-64"}

(limited to frame 12 and 14)

tmijieux commented 2 years ago

Using the "new" bridge implementation seems to be a valid workaround for now.

workaround ``` reminder if someone find this and does not know how to do this: Add a file with AndroidEnvironment build action (or an AndroidEnvironment item in csproj) in Android project with the following content: MONO_GC_PARAMS=bridge-implementation=new if you suffer from performance degradation you may want to try to change some other params: MONO_GC_PARAMS=bridge-implementation=new,mode=throughput ```

I am still worried that some of the code is doing some memory corruption and changing the implementation is just preventing crashes because of arbitrary implementation luck. But if all features of my app seems to work correctly, and it is not crashing anymore, then it it still better than nothing :sweat_smile:. For the wellbeing of my app, I wish that there is a bug in the "tarjan" bridge implementation and changing it to "new" definitely fix my bug, but if that is the case then is probably not a good news for you ... :fearful:

I managed to get more information. I put the source of mono/sgen in C:\Users\builder\jenkins\workspace\archive-mono\2020-02\android\debug\mono\sgen and setting most library from C:/src/xamarin-android/bin/Debug/lib/xamarin.android/xbuild/Xamarin/Android/lib/x86_64/ in visual studio 'Additional Symbol Search Paths' by renaming the .so files to match the one in the apk allowed me to get the following informations:

-exec p *full_object->vtable
$7 = {klass = 0x1a00, gc_descr = 32342887324176384, domain = 0x72e66551eda000, type = 0x72e83fc77d0000, interface_bitmap = 0x20100000000cd00 <error: Cannot access memory at address 0x20100000000cd00>, max_interface_id = 0, rank = 0 '\000', initialized = 0 '\000', flags = 0 '\000', remote = 0, init_failed = 0, has_static_fields = 0, gc_bits = 0, imt_collisions_bitmap = 0, runtime_generic_context = 0x0, interp_vtable = 0x41da500000, vtable = 0x72e83fc7e42f}

image image image image

It seems the three lower bits are consistently set to zero during the few times i was able to reproduce. Maybe the values can give ideas to people familiar with gc implementation what could be causing this. Also lots of variable that are most probably supposed to be pointer does not seems to point to valid memory. So maybe this is definitely not a regular gc object that the gc is currently looking at...


~~Unless someone want to get to the bottom of this and needs my help for reproducing or to get more info (i will glady help), or if the bug reappear i will probably not update this issue anymore. It still needs to go through quality check yet and test on many more devices, so this solution is not yet completely accepted on my side, but if it is then, i will close this issue.~~


EDIT: I am rather eager to undestand what is going on to be sure this bug will not show up again uninvited,

so I read a little bit of code and documentation about sgen and I saw that if the 3 lower bits on vtable address are actually flags for cemented/pinned/forwarded state in the gc (and in my screenshot it is the case ...) and i also saw some bits of code like this in the source:

        /* We untag the vtable for concurrent M&S, in case bridge is running and it tagged it */
        desc = sgen_vtable_get_descriptor ((GCVTable)SGEN_POINTER_UNTAG_VTABLE (vtable_word));

so i am wondering if there could be a rare race condition happening with tarjan implementation that could re-tag object after this line of code, and then scan works with a tagged object where it does not expect it to be ? (in this case the memory the gc is looking would just be shifted by 7 bytes which could explain the issue...)

tmijieux commented 2 years ago

As i suspected if i just clear out the 3 lower bits of the vtable address in the debugger I get valid objects everytime (but never the same type of object).

ghost commented 2 years ago

Possibly related: https://stackoverflow.com/questions/70223786/is-this-sigabrt-crash-in-android-app-caused-by-xamarin-log-handler

tmijieux commented 2 years ago

I have made substantial progress in identifying and reproducing the issue, that i reported here:

the bug affects xamarin-android in its default configuration but it is specific for mono sgen and the bug is also a rather corner-case, most probably unlikely race-condition so I don't know if any action is required to be taken here or not ? (except continue to integrate upstream mono bugfixes when they are released). ( changing the default implementation for the bridge is probably a bad idea because for me it created noticeable performance degration on some android device and i had to tweak some other gc parameters to get it to an acceptable level of performance)

grendello commented 2 years ago

@tmijieux thanks for a very thorough job investigating the issue! However, Xamarin.Android is but a "client" of the Mono runtime, so I'll pass the buck to @lambdageek who will hopefully be able to address and fix the issue, thanks again!

lambdageek commented 2 years ago

There's not much for me to do, other than collect all the backports and shepherd them in: @tmijieux did I great job investigating and fixing the underlying issue.

lambdageek commented 2 years ago

@jonpryor @grendello The runtime fixes are in mono 2020-02 and dotnet release/6.0-maui. For !NET6 bump to mono/mono@a5d1934898bfdf06662cee5799782b09ce8afe5a

Thanks again @tmijieux !

FelixZY commented 2 years ago

Glad to see this fixed so quickly 🙂 any idea when we'll see a release with this fix included (currently a pretty large crasher for us)?

gsgou commented 2 years ago

Anybody knows a Xamarin.Android version that works with .NET 5? I can reproduce on Pixel 4 (5G) - Android 12. The workaround with: "MONO_GC_PARAMS=bridge-implementation=new,mode=throughput" did not help.

@FelixZY i am building the app with v. 6.12.0.164 but it does not fix: https://stackoverflow.com/questions/70223786/is-this-sigabrt-crash-in-android-app-caused-by-xamarin-log-handler We probably need a new Issue. Do we need to build with a different Xamarin-Android version too?