getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
586 stars 206 forks source link

[iOS / Native AOT] Bad stack trace due to unsupported marshalling behavior #3280

Open tipa opened 6 months ago

tipa commented 6 months ago

Package

Sentry

.NET Flavor

.NET

.NET Version

8.0.2

OS

iOS

SDK Version

4.2.1

Self-Hosted Sentry Version

No response

Steps to Reproduce

Crash in NativeAOT compiled iOS app (there might be more prerequisites, like the managed exception to be marshalled to native code and back to managed).

Expected Result

Crash with useful strack trace including managed C# function from where it originated

Actual Result

Thread 0 Crashed:
0   libsystem_kernel.dylib          0x39fe95fbc         __pthread_kill
1   libsystem_pthread.dylib         0x3e519567c         pthread_kill
2   libsystem_c.dylib               0x3210abb8c         abort
3   MyXYTestApp                     0x20499ce8c         xamarin_assertion_message (runtime.m:1461)
4   MyXYTestApp                     0x20499d5d4         xamarin_process_managed_exception (runtime.m:2441)
5   MyXYTestApp                     0x20499d458         xamarin_process_managed_exception_gchandle (runtime.m:1123)
6   MyXYTestApp                     0x2049c6508         -[App application:didFinishLaunchingWithOptions:] (registrar.mm:2760)
7   UIKitCore                       0x31562361c         -[UIApplication _handleDelegateCallbacksWithOptions:isSuspended:restoreState:]
8   UIKitCore                       0x315622784         -[UIApplication _callInitializationDelegatesWithActions:forCanvas:payload:fromOriginatingProcess:]
9   UIKitCore                       0x315621768         -[UIApplication _runWithMainScene:transitionContext:completion:]
10  UIKitCore                       0x3156213b4         -[_UISceneLifecycleMultiplexer completeApplicationLaunchWithFBSScene:transitionContext:]
11  UIKitCore                       0x31559dedc         _UIScenePerformActionsWithLifecycleActionMask
12  UIKitCore                       0x3156253b0         __101-[_UISceneLifecycleMultiplexer _evalTransitionToSettings:fromSettings:forceExit:withTransitionStore:]_block_invoke
13  UIKitCore                       0x31554ce44         -[_UISceneLifecycleMultiplexer _performBlock:withApplicationOfDeactivationReasons:fromReasons:]
14  UIKitCore                       0x31554b8bc         -[_UISceneLifecycleMultiplexer _evalTransitionToSettings:fromSettings:forceExit:withTransitionStore:]
15  UIKitCore                       0x31554b224         -[_UISceneLifecycleMultiplexer uiScene:transitionedFromState:withTransitionContext:]
16  UIKitCore                       0x31554b0f4         __186-[_UIWindowSceneFBSSceneTransitionContextDrivenLifecycleSettingsDiffAction _performActionsForUIScene:withUpdatedFBSScene:settingsDiff:fromSettings:transitionContext:lifecycleActionType:]_block_invoke
17  UIKitCore                       0x31554affc         +[BSAnimationSettings(UIKit) tryAnimatingWithSettings:fromCurrentState:actions:completion:]
18  UIKitCore                       0x31554a884         _UISceneSettingsDiffActionPerformChangesWithTransitionContextAndCompletion
19  UIKitCore                       0x31554a534         -[_UIWindowSceneFBSSceneTransitionContextDrivenLifecycleSettingsDiffAction _performActionsForUIScene:withUpdatedFBSScene:settingsDiff:fromSettings:transitionContext:lifecycleActionType:]
20  UIKitCore                       0x3158ce26c         __64-[UIScene scene:didUpdateWithDiff:transitionContext:completion:]_block_invoke.225
21  UIKitCore                       0x3155496b8         -[UIScene _emitSceneSettingsUpdateResponseForCompletion:afterSceneUpdateWork:]
22  UIKitCore                       0x315549528         -[UIScene scene:didUpdateWithDiff:transitionContext:completion:]
23  UIKitCore                       0x315661c94         -[UIApplication workspace:didCreateScene:withTransitionContext:completion:]
24  UIKitCore                       0x315661a2c         -[UIApplicationSceneClientAgent scene:didInitializeWithEvent:completion:]
25  FrontBoardServices              0x3419176d0         -[FBSScene _callOutQueue_didCreateWithTransitionContext:completion:]
26  FrontBoardServices              0x34191756c         __92-[FBSWorkspaceScenesClient createSceneWithIdentity:parameters:transitionContext:completion:]_block_invoke.108
27  FrontBoardServices              0x341916198         -[FBSWorkspace _calloutQueue_executeCalloutFromSource:withBlock:]
28  FrontBoardServices              0x341921f88         __92-[FBSWorkspaceScenesClient createSceneWithIdentity:parameters:transitionContext:completion:]_block_invoke
29  libdispatch.dylib               0x320fac2fc         _dispatch_client_callout
30  libdispatch.dylib               0x320fafd44         _dispatch_block_invoke_direct
31  FrontBoardServices              0x34191251c         __FBSSERIALQUEUE_IS_CALLING_OUT_TO_A_BLOCK__
32  FrontBoardServices              0x34191249c         -[FBSMainRunLoopSerialQueue _targetQueue_performNextIfPossible]
33  FrontBoardServices              0x341912374         -[FBSMainRunLoopSerialQueue _performNextFromRunLoopSource]
34  CoreFoundation                  0x310fef0a8         __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__
35  CoreFoundation                  0x310fee324         __CFRunLoopDoSource0
36  CoreFoundation                  0x310fecad8         __CFRunLoopDoSources0
37  CoreFoundation                  0x310feb814         __CFRunLoopRun
38  CoreFoundation                  0x310feb3f4         CFRunLoopRunSpecific
39  GraphicsServices                0x397b374f4         GSEventRunModal
40  UIKitCore                       0x31563e89c         -[UIApplication _run]
41  UIKitCore                       0x31563ded8         UIApplicationMain
42  MyXYTestApp                     0x20499034c         xamarin_UIApplicationMain (bindings.m:126)

I believe this is caused because of this event handler that changes the default marshalling behavior of exceptions: https://github.com/getsentry/sentry-dotnet/blob/bea3fcf35dc0a2787dbf9c96931ec9895989907c/src/Sentry/Platforms/Cocoa/SentrySdk.cs#L14-L17

The code was introduced after this discussion in the Xamarin repo: https://github.com/xamarin/xamarin-macios/issues/15252

According to @rolfbjarne from the .NET iOS/macOS team, CoreCLR (which is what is used on iOS when using Native AOT as well as on macOS) doesn't support MarshalManagedExceptionMode.UnwindNativeCode and therefore, this assertion is crashing the process: https://github.com/xamarin/xamarin-macios/blob/907081f787315704a01c940cf28b46b47db23df0/runtime/runtime.m#L2362-L2364 https://github.com/xamarin/xamarin-macios/blob/907081f787315704a01c940cf28b46b47db23df0/runtime/runtime.m#L2452-L2455

jamescrosswell commented 6 months ago

According to @rolfbjarne from the .NET iOS/macOS team, CoreCLR (which is what is used on iOS when using Native AOT as well as on macOS) doesn't support MarshalManagedExceptionMode.UnwindNativeCode and therefore, this assertion is crashing the process:

Hm, I'm not sure I follow everything that's going on here but it seems like that assertion is also dependent on xamarin_is_gc_coop, which is only true when TARGET_OS_WATCH:

#if TARGET_OS_WATCH
bool xamarin_is_gc_coop = true;
#else
bool xamarin_is_gc_coop = false;
#endif

I can't see anything from Sentry in that stack trace though. Do you know what kind of exception was originally thrown (that resulted in this stack trace)?

tipa commented 6 months ago

I don' think it is dependent on xamarin_is_gc_coop. I might have posted links from the wrong commit, this one shows how the xamarin_assertion_message method is called in line 2441, like in the stack trace: https://github.com/xamarin/xamarin-macios/blob/ed26faa94fe2734d9c1014d2e6ef7173d4d77690/runtime/runtime.m#L2441 and then abort() in line 1461: https://github.com/xamarin/xamarin-macios/blob/ed26faa94fe2734d9c1014d2e6ef7173d4d77690/runtime/runtime.m#L1461

I don't know what exception was originally thrown (that's what I'm trying to find out), @rolfbjarne recommended that I should also subscribe to the ObjCRuntime.Runtime.MarshalManagedException event and then log the exception manually (more on that is documented here. UnwindNativeCode is documented as This isn't available when [...] when using CoreCLR).

jamescrosswell commented 6 months ago

OK thanks, that makes more sense. If we can find a way to throw an exception that reproduces this behaviour then we should be able to work out a solution 👍🏻

tipa commented 1 month ago

@jamescrosswell you can reproduce this behavior / stack trace by simply throwing an exception in the AppDelegate.FinishedLaunching method in an NativeAot-compiled app. You might have to upload a build to TestFlight as (to my knowledge) it is not possible to run NativeAOT compiled iOS apps locally. Since there will be more adoption of NativeAOT in the future (e.g. with the launch of .NET 9), would it be possible to optionally disable this part of the Sentry initialization, so that the unsupported marshalling behavior isn't used? https://github.com/getsentry/sentry-dotnet/blob/bea3fcf35dc0a2787dbf9c96931ec9895989907c/src/Sentry/Platforms/Cocoa/SentrySdk.cs#L14-L17

bitsandfoxes commented 1 month ago

would it be possible to optionally disable this part of the Sentry initialization

We could definitely put these on the Cocoa specific native options - like we do with the LogCat options for Android.

bitsandfoxes commented 1 month ago

Upping the priority on this as it crashes the process.

jamescrosswell commented 1 month ago

Upping the priority on this as it crashes the process.

@bitsandfoxes what do we want to do about this?

you might have to upload a build to TestFlight as (to my knowledge) it is not possible to run NativeAOT compiled iOS apps locally.

Realistically, we can't test for things like this until/unless we create a developer profile and an app in the Apple store that we start down the QA route. Having done that once personally for a Flutter app, I suspect it's going to take about 3-5 days to get this all setup via CI so that we can easily deploy new versions of our app (e.g. Sentry.Samples.MAUI) to the iOS store... there are various encryption/deployment keys to manage that we'll need to sync between CI and our local machines etc.

I don't think we've got the bandwidth to do this right now. Perhaps once we've sorted out what we're going to do about net9.0 (and the device tests) we could circle back to this.

tipa commented 4 weeks ago

@bitsandfoxes what do we want to do about this?

Would it not be possible to allow an optional/temporary way to disable the changing of marshalling bahavior, as I suggested above? I don't know if it would really work, but at the moment, the stack traces are just barely understandable...

tipa commented 4 weeks ago

I was able to run a Native AOT compiled app locally on a physical device using dotnet publish /t:Run /p:_DeviceName=DEVICEID where DEVICEID can be queried using xcrun xctrace list devices

jamescrosswell commented 4 weeks ago

I was able to run a Native AOT compiled app locally on a physical device using dotnet publish /t:Run /p:_DeviceName=DEVICEID where DEVICEID can be queried using xcrun xctrace list devices

That helps... it means we don't have to push it all the way through to test flight. Just a bit of fluffing around setting up encryption keys, developer profiles and configuring an app in the apple dev portal.

So it sounds a little less painful, when we do have the bandwidth to look into this.