getsentry / sentry-java

A Sentry SDK for Java, Android and other JVM languages.
https://docs.sentry.io/
MIT License
1.14k stars 433 forks source link

UncaughtExceptionHandlerIntegration.java causes ANR when waiting for File I/O #2719

Closed kylannjohnson closed 1 year ago

kylannjohnson commented 1 year ago

Integration

sentry-android

Build System

Gradle

AGP Version

7.4.2

Proguard

Enabled

Version

6.16.0

Steps to Reproduce

Not entirely sure. The ANR seems to come from an uncaught exception in coroutine based code.

Expected Result

App shouldn't crash.

Actual Result

here is the ANR with some redactions

  at jdk.internal.misc.Unsafe.park (Native method)
  at java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:234)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos (AbstractQueuedSynchronizer.java:1079)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos (AbstractQueuedSynchronizer.java:1369)
  at java.util.concurrent.CountDownLatch.await (CountDownLatch.java:278)
  at io.sentry.UncaughtExceptionHandlerIntegration$UncaughtExceptionHint.waitFlush (UncaughtExceptionHandlerIntegration.java:169)
  at io.sentry.UncaughtExceptionHandlerIntegration.uncaughtException (UncaughtExceptionHandlerIntegration.java:106)
  at [**redacted**]GlobalExceptionHandler.uncaughtException (GlobalExceptionHandler.kt:23)
  at com.google.firebase.crashlytics.internal.common.CrashlyticsUncaughtExceptionHandler.uncaughtException (CrashlyticsUncaughtExceptionHandler.java:62)
  at java.lang.ThreadGroup.uncaughtException (ThreadGroup.java:1073)
  at java.lang.ThreadGroup.uncaughtException (ThreadGroup.java:1068)
  at kotlinx.coroutines.CoroutineExceptionHandlerImplKt.handleCoroutineExceptionImpl (CoroutineExceptionHandlerImpl.kt:61)
  at kotlinx.coroutines.CoroutineExceptionHandlerKt.handleCoroutineException (CoroutineExceptionHandler.kt:33)
  at kotlinx.coroutines.StandaloneCoroutine.handleJobException (Builders.common.kt:196)
  at kotlinx.coroutines.JobSupport.finalizeFinishingState (JobSupport.kt:229)
  at kotlinx.coroutines.JobSupport.tryMakeCompletingSlowPath (JobSupport.kt:906)
  at kotlinx.coroutines.JobSupport.tryMakeCompleting (JobSupport.kt:863)
  at kotlinx.coroutines.JobSupport.makeCompletingOnce$kotlinx_coroutines_core (JobSupport.kt:828)
  at kotlinx.coroutines.AbstractCoroutine.resumeWith (AbstractCoroutine.kt:100)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:46)
  at kotlinx.coroutines.internal.DispatchedContinuation.resumeUndispatchedWith (DispatchedContinuation.java:256)
  at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith (DispatchedContinuation.kt:282)
  at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith$default (DispatchedContinuation.kt:278)
  at kotlinx.coroutines.internal.ScopeCoroutine.afterCompletion (Scopes.kt:28)
  at kotlinx.coroutines.JobSupport.continueCompleting (JobSupport.kt:936)
  at kotlinx.coroutines.JobSupport.access$awaitSuspend (JobSupport.kt)
  at kotlinx.coroutines.JobSupport.access$continueCompleting (JobSupport.kt)
  at kotlinx.coroutines.JobSupport$ChildCompletion.invoke (JobSupport.kt:1155)
  at kotlinx.coroutines.JobSupport.notifyHandlers (JobSupport.kt:368)
  at kotlinx.coroutines.JobSupport.notifyCompletion (JobSupport.kt:362)
  at kotlinx.coroutines.JobSupport.completeStateFinalization (JobSupport.kt:323)
  at kotlinx.coroutines.JobSupport.finalizeFinishingState (JobSupport.kt:240)
  at kotlinx.coroutines.JobSupport.continueCompleting (JobSupport.kt:935)
  at kotlinx.coroutines.JobSupport.access$awaitSuspend (JobSupport.kt)
  at kotlinx.coroutines.JobSupport.access$continueCompleting (JobSupport.kt)
  at kotlinx.coroutines.JobSupport$ChildCompletion.invoke (JobSupport.kt:1155)
  at kotlinx.coroutines.JobSupport.notifyHandlers (JobSupport.kt:368)
  at kotlinx.coroutines.JobSupport.notifyCompletion (JobSupport.kt:362)
  at kotlinx.coroutines.JobSupport.completeStateFinalization (JobSupport.kt:323)
  at kotlinx.coroutines.JobSupport.finalizeFinishingState (JobSupport.kt:240)
  at kotlinx.coroutines.JobSupport.tryMakeCompletingSlowPath (JobSupport.kt:906)
  at kotlinx.coroutines.JobSupport.tryMakeCompleting (JobSupport.kt:863)
  at kotlinx.coroutines.JobSupport.makeCompletingOnce$kotlinx_coroutines_core (JobSupport.kt:828)
  at kotlinx.coroutines.AbstractCoroutine.resumeWith (AbstractCoroutine.kt:100)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:46)
  at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:104)
  at android.os.Handler.handleCallback (Handler.java:942)
  at android.os.Handler.dispatchMessage (Handler.java:99)
  at android.os.Looper.loopOnce (Looper.java:226)
  at android.os.Looper.loop (Looper.java:313)
  at android.app.ActivityThread.main (ActivityThread.java:8757)
  at java.lang.reflect.Method.invoke (Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:571)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1067)
romtsn commented 1 year ago

@kylannjohnson the problem is due to using both Crashlytics and Sentry alongside each other. Because Crashlytics is initialized first, its exception handler will be invoked before ours; they have a default timeout of 3 seconds (used to be 4 seconds a couple of months ago, so depending on the version you use it still might be 4 seconds).

This means if they consume the entire 3 or 4 seconds of the main thread time, we only have 1 seconds to process the exception, otherwise it'll result in an ANR like you've faced. In addition, you have your own GlobalExceptionHandler, which I assume also adds up to the main thread blocking time, so we have even less time to process the exception. From our side we're just holding the lock for a short amount of time until the exception event gets serialized to disk (to make sure we don't lose it) and after that we release the lock. I hope this all makes sense to you.

I don't think we can actually solve this from our side, but we can make some improvements to our crash pipeline that might reduce the chance of getting an ANR for such cases. Also, I'm gonna document this as this keeps popping up when people comparing our SDK against Crashlytics and running them both together.

kylannjohnson commented 1 year ago

Thanks for the response! It does make sense. For now, I'll attempt to put Sentry first in the chain. What improvements would you be considering?

romtsn commented 1 year ago

Thanks for the response! It does make sense. For now, I'll attempt to put Sentry first in the chain. What improvements would you be considering?

From the top of my head we could do at least 2 improvements right away:

kylannjohnson commented 1 year ago

FWIW, those both sound like great options. Especially reducing the 15 second to 3-4s.

markushi commented 1 year ago

Let's also extend our docs with some troubleshooting guidance around the topic of using multiple crash handlers. Regarding prioritize crash events:

romtsn commented 1 year ago

@kylannjohnson I've filed #2732 #2733 and https://github.com/getsentry/sentry-docs/issues/7017 to address this, gonna close this issue please track it in the other ones. Thank you!