bugsnag / bugsnag-react-native

Error monitoring and reporting tool for native exceptions and JS errors in React Native apps
https://docs.bugsnag.com/platforms/react-native
MIT License
370 stars 121 forks source link

Deadlock while reporting an error to Bugsnag via `notify` #448

Closed derrh closed 4 years ago

derrh commented 4 years ago

Description

Notify Bugsnag of an error without a deadlock.

Issue

It's taken quite a bit of sleuthing to come to the conclusion that Bugsnag was causing our app to freeze. It appears to be a race condition with very sensitive timing. All of our crash logs resulting from the deadlock (watchdog on iOS finally killing the app due to the deadlock) include the following stack trace on the React Native Bugsnag thread:

Thread 2 Queue : com.facebook.react.BugsnagReactNativeQueue (serial)
#0  0x000000019a8878f0 in __ulock_wait ()
#1  0x000000019a7a1040 in _os_unfair_lock_lock_slow ()
#2  0x000000019a89c628 in dyld3::AllImages::imageLoadAddressByIndex(unsigned int) const ()
#3  0x0000000104acf2e0 in bsg_kscrw_i_writeBinaryImage at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashReport.c:1091
#4  0x0000000104acf4dc in bsg_kscrw_i_writeBinaryImages at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashReport.c:1168
#5  0x0000000104ad0180 in bsg_kscrashreport_writeStandardReport at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashReport.c:1587
#6  0x0000000104ad252c in bsg_kscrashsentry_reportUserException at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/Sentry/BSG_KSCrashSentry_User.c:140
#7  0x0000000104acc290 in bsg_kscrash_reportUserException at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashC.c:194
#8  0x0000000104acb7a8 in -[BSG_KSCrash reportUserException:reason:originalException:handledState:appState:callbackOverrides:metadata:config:discardDepth:terminateProgram:] at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrash.m:306
#9  0x0000000104ae617c in -[BugsnagCrashSentry reportUserException:reason:originalException:handledState:appState:callbackOverrides:metadata:config:discardDepth:] at .../cocoa/vendor/bugsnag-cocoa/Source/BugsnagCrashSentry.m:53
#10 0x0000000104aebfe0 in -[BugsnagNotifier notify:handledState:block:] at .../cocoa/vendor/bugsnag-cocoa/Source/BugsnagNotifier.m:672
#11 0x0000000104aeb620 in -[BugsnagNotifier internalClientNotify:withData:block:] at .../cocoa/vendor/bugsnag-cocoa/Source/BugsnagNotifier.m:593
#12 0x0000000104adcde8 in +[Bugsnag internalClientNotify:withData:block:] at .../cocoa/vendor/bugsnag-cocoa/Source/Bugsnag.m:183
#13 0x0000000104aeef4c in -[BugsnagReactNative notify:resolve:reject:] at .../cocoa/BugsnagReactNative.m:168
#14 0x000000019aaa3554 in __invoking___ ()
#15 0x000000019a96c080 in -[NSInvocation invoke] ()
#16 0x000000019a96cc98 in -[NSInvocation invokeWithTarget:] ()
#17 0x0000000105dbbbbc in ::-[RCTModuleMethod invokeWithBridge:module:arguments:](RCTBridge *, id, NSArray *) at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/Base/RCTModuleMethod.mm:569
#18 0x0000000105dbdcdc in facebook::react::invokeInner(RCTBridge*, RCTModuleData*, unsigned int, folly::dynamic const&) at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/CxxModule/RCTNativeModule.mm:108
#19 0x0000000105dbda40 in facebook::react::RCTNativeModule::invoke(unsigned int, folly::dynamic&&, int)::$_0::operator()() const [inlined] at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/CxxModule/RCTNativeModule.mm:73
#20 0x0000000105dbda14 in ::___ZN8facebook5react15RCTNativeModule6invokeEjON5folly7dynamicEi_block_invoke() at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/CxxModule/RCTNativeModule.mm:65
#21 0x000000019a740b7c in _dispatch_call_block_and_release ()
#22 0x000000019a741fd8 in _dispatch_client_callout ()
#23 0x000000019a748450 in _dispatch_lane_serial_drain ()
#24 0x000000019a748e7c in _dispatch_lane_invoke ()
#25 0x000000019a751f20 in _dispatch_workloop_worker_thread ()                                                                                                                                                 
#26 0x000000019a7a76d0 in _pthread_wqthread ()

Environment

Library versions:

├─ bugsnag-react-native@2.23.6
└─ react-native@0.61.3

Example code snippet

// when the timing is just right:
errorClient.notify(error)

Here are a few crash logs: https://gist.github.com/derrh/6bfdd0e26647e3812691b52e86d9609a https://gist.github.com/derrh/c20a61d0c596e850d47e5b87b921490a https://gist.github.com/derrh/ee5b91e3ca827d17c1a61044a37d0872

I was not able to reproduce this issue in a minimal scenario due to the challenging timing requirements.

mattdyoung commented 4 years ago

Hi @derrh

Thanks for the report. We're just looking in to what could be causing this.

Can you share how you're configuring Bugsnag i.e. the full code configuration both in the React Native JS and any iOS native configuration (just strip out your API key)?

Also can you confirm if you've registered any "beforeSend" callbacks to modify the Bugsnag report or have added any additional metadata or breadcrumbs before notifying Bugsnag?

derrh commented 4 years ago

Hello, @mattdyoung, Thanks for the quick reply. We have a very basic config.

const config = new Configuration("<bugsnag api key>")
config.appVersion = version
const errorClient = new Client(config)

We aren't doing anything custom on the native side, and we do not have any beforeSend callbacks registered.

Ping me if you need any more info on this. I'd be happy to help in any way I can. I have a branch where I can pretty regularly reproduce the freeze if that's helpful, not in code I can share unfortunately, but if additional crash logs or stack traces can help let me know.

mattdyoung commented 4 years ago

@derrh Thanks - we're just trying to see if we can reproduce.

When you say you can regularly reproduce the freeze does that include on an iOS simulator or only on an actual device?

You say the timing of the call to notify is key. Can you shed any light on what your app might be doing at that point?

derrh commented 4 years ago

@mattdyoung, I don't think we were ever able to reproduce it on the simulator.

In our case it was nearly always freezing when certain images in our app were being rendered. For a while we were trying to determine what was special about these particular images. We knew that removing the images removed the freeze, but we couldn't figure out why. Eventually we realized that the react native Bugsnag thread was stopped in the same place every stack trace.

Something else that may be relevant: It only happens on app launch. It's a few seconds after app launch because several API calls have time to complete and the timing of those API calls was critical to the deadlock (one of those API calls was "notify"-ing), and the others caused the rendering of the images that were acting suspicious. Removing either one of those two actions and the freeze would not happen. In the end we stopped logging that particular error because it was a warning on a benign error that we just wanted to track, but not worth the deadlock of course.

If it would be helpful, I'd be happy to jump on a zoom meeting and step through it for you.

abigailbramble commented 4 years ago

@derrh thank you for the additional information! Can you please write into support@bugsnag.com where we can continue this in more detail and look to set up a zoom meeting if necessary? Thanks!

drhops commented 4 years ago

@mattdyoung @derrh I'm curious if you were able to confirm this issue? I suspect it could be related to similar behavior we're seeing and interested if there's an update. Thanks!

bigyelow commented 4 years ago

The implementation of AllImages::imageLoadAddressByIndex is different in the latest version and the last version which the former uses the withReadLock block to read the _loadedImages value while the latter doesn’t.

Not sure if this is helpful for your team to investigate. @mattdyoung .

@derrh I guess there was another thread calling a method in AllImages.cpp that uses the withReadLock block which helped result in a deadlock.

bigyelow commented 4 years ago

@derrh Ah, I have seen all the 3 links of your crash logs, they just proves my guess that some other thread(s) called the AllImages:xx method.

mattdyoung commented 4 years ago

@derrh @bigyelow Thanks for the info on this. We believe we've identified the issue and are looking at releasing a fix.

bigyelow commented 4 years ago

@mattdyoung Does this bug relate to iOS system or logic in Bugsnag itself?

mattdyoung commented 4 years ago

@bigyelow We're still working on the issue so not entirely sure yet what the fix will be but even if it's an iOS issue we may be able to work around it in Bugsnag to prevent it.

abigailbramble commented 4 years ago

Released in v2.23.8.