Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
12.99k stars 1.85k forks source link

Crash with kotlinx.coroutines.JobCancellationException #2170

Closed frist008 closed 4 years ago

frist008 commented 4 years ago

Hello. Migrated my project from RxJava to Kotlin coroutines. My project has up to 2000-3000 online per day and 20_000 per month.

I started getting all sorts of errors on Crashlitics with JobCancellationException.

Full log: Fatal Exception: kotlinx.coroutines.JobCancellationException: Job was cancelled.

However, there are 30-50 other streams. I'm worried that I don't have a Stacktrace where the error occurred? and I cannot fix the problem.

Furthermore. I cannot reproduce it myself. This error occurs with real users, but not me. For real users, the error does not repeat itself for a month. The sample for all users is very small - 0.23%. And the chance of repetition is even less.

The error header can be quite different on Crashlitics:

AndroidLeakFixes.java line 400
leakcanary.AndroidLeakFixes$Companion$backgroundExecutor$1$thread$1.run

bn.java line 1
bn.a

bn.java line 1
bn.b

bf.java line 1
bf.b

rw.java line 2
rw.run

zj.java line 2
zj.a

bt.java line 1
bt.a

DpmTcmClient.java line 149
com.qti.tcmclient.DpmTcmClient$TcmReceiver.run

GlideExecutor.java line 393
com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run

You may also have noticed that I have listed 2 sides of the library that are thrown with JobCancellationException.

It's funny, but I don't use coroutines explicitly in them. It's just magic.

implementation "org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.3.72"
implementation "org.jetbrains.kotlinx:kotlinx-coroutines-core:1.3.8"
implementation "org.jetbrains.kotlinx:kotlinx-coroutines-android:1.3.8"
implementation "org.jetbrains.kotlinx:kotlinx-coroutines-play-services:1.3.8"

I am using a wrapper to run coroutines.

fun CoroutineScope.launchSafe(
    context: CoroutineContext = EmptyCoroutineContext,
    start: CoroutineStart = CoroutineStart.DEFAULT,
    launchSafe: LaunchSafe = object : LaunchSafe {
        override fun onCancellation(e: CancellationException) {
            Timber.v(e)
        }

        override fun onFailure(t: Throwable) {
            Timber.e(t)
        }
    },
    onStart: (suspend CoroutineScope.() -> Unit)? = null,
    block: suspend CoroutineScope.() -> Unit
) = launch(context, start) {
    try {
        onStart?.invoke(this)
        block()
    } catch (e: CancellationException) {
        try {
            launchSafe.onCancellation(e)
        } catch (t: Throwable) {
            Timber.e(t)
        }
    } catch (t: Throwable) {
        try {
            launchSafe.onFailure(t)
        } catch (t: Throwable) {
            Timber.e(t)
        }
    }
}

interface LaunchSafe {

    @WorkerThread
    fun onCancellation(e: CancellationException)

    @WorkerThread
    fun onFailure(t: Throwable)
}
abstract class BaseViewModel : ViewModel(), CoroutineScope, LaunchSafe {

    override var coroutineContext =
        Dispatchers.Default + SupervisorJob() + CoroutineName("ViewModel") + SafeCoroutineExceptionHandler()

    val errorLiveEvent = LiveEvent<Throwable>()

    override fun onFailure(t: Throwable) {
        errorLiveEvent.postValue(t)
    }

    override fun onCancellation(e: CancellationException) {
        // none
    }

    override fun onCleared() {
        coroutineContext.cancel()
        super.onCleared()
    }
}

@Suppress("FunctionName", "NOTHING_TO_INLINE")
inline fun SafeCoroutineExceptionHandler() = CoroutineExceptionHandler { _, throwable ->
    if (BuildConfig.DEBUG) {
        Thread.currentThread()
            .uncaughtExceptionHandler
            ?.uncaughtException(Thread.currentThread(), throwable)
    } else {
        Timber.e(throwable, "Error ExceptionHandler")
    }
}
frist008 commented 4 years ago

All logs usually contain Unsafe.park.

glide-source-thread-3
       at java.lang.Object.wait(Object.java)
       at java.lang.Thread.parkFor$(Thread.java:2161)
       at sun.misc.Unsafe.park(Unsafe.java:358)
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:190)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2059)
       at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:548)
       at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1087)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
       at java.lang.Thread.run(Thread.java:784)
       at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:393)
sun.misc.Unsafe.park (Unsafe.java)
java.util.concurrent.locks.LockSupport.park (LockSupport.java:190)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2067)
java.util.concurrent.PriorityBlockingQueue.take (PriorityBlockingQueue.java:548)
bn.a (bn.java:1)
bn.run (bn.java:2)
qwwdfsad commented 4 years ago

Hi,

I'm worried that I don't have a Stacktrace where the error occurred

Please try to enable debug mode: https://github.com/Kotlin/kotlinx.coroutines/blob/master/docs/debugging.md#debug-mode to have stacktraces in JobCancellationException

It's hard to tell what exactly is wrong without actual stacktraces, but as a wild guess, please ensure that you do not invoke channel.offer from non-suspending methods as it can unexpectedly throw CancellationException (#974). Also, it is worth to add SafeCoroutineExceptionHandler to your launchSafe to handle exceptions from coroutine children as well

torresmi commented 4 years ago

My team encountered this as well and unfortunately, we haven't identified the root cause.

But we did make the following changes in a release and haven't seen it since. Maybe these will be helpful:

  1. Replaced our Observable.asFlow() usages to Observable.toFlowable(BackPressureStrategy.LATEST).asFlow(), like this comment from #2104.
  2. Replaced our Flow.asObservable() usages to Flow.asFlowable().toObservable()
  3. A lot of our exceptions involved OkHttp usage. So we updated from v3.14.9 to v4.7.2.
  4. Updated Coroutines from v1.3.7 to v1.3.8

Other things to note:

frist008 commented 4 years ago

Hi,

I'm worried that I don't have a Stacktrace where the error occurred

Please try to enable debug mode: https://github.com/Kotlin/kotlinx.coroutines/blob/master/docs/debugging.md#debug-mode to have stacktraces in JobCancellationException

It's hard to tell what exactly is wrong without actual stacktraces, but as a wild guess, please ensure that you do not invoke channel.offer from non-suspending methods as it can unexpectedly throw CancellationException (#974). Also, it is worth to add SafeCoroutineExceptionHandler to your launchSafe to handle exceptions from coroutine children as well

We weren't using channel.offer or RX explicitly in our source code.

I already add SafeCoroutineExceptionHandlerin coroutineContext for BaseViewModel : ViewModel(), CoroutineScope. If I add it again to launchSafe, won't it be a duplicate?

frist008 commented 4 years ago

I am not using experimental or deprecated functions in the project at all.

I have set in Application System.setProperty (DEBUG_PROPERTY_NAME, DEBUG_PROPERTY_VALUE_ON)

I think it will be necessary to wait a few days. Before I get any results from users in beta testing.

qwwdfsad commented 4 years ago

Replaced our Observable.asFlow() usages to Observable.toFlowable(BackPressureStrategy.LATEST).asFlow()

Thanks, that's a great place to start looking!

I already add SafeCoroutineExceptionHandler in coroutineContext for BaseViewModel : ViewModel(), CoroutineScope. If I add it again to launchSafe, won't it be a duplicate?

If the scope of BaseViewModel is used for launchSafe, then it's indeed not necessary

frist008 commented 4 years ago

I found a bug. It consisted in the fact that I used in one place in my code the suspend function in the catch (e: CancellationException) block.

Helped me:

class Application : Application {

    override fun onCreate() {
        super.onCreate()

        System.setProperty(DEBUG_PROPERTY_NAME, DEBUG_PROPERTY_VALUE_ON)
    }
}

I continue to observe. If no new issues are found by the end of the month, I will close the discussion.

frist008 commented 4 years ago

Perhaps this problem is somehow related to a Google bug. https://issuetracker.google.com/issues/164317567

frist008 commented 4 years ago

After switching to Kotlin 1.4, no new error messages appeared. Thanks to all.

mhernand40 commented 4 years ago

@frist008 have you by any chance switched back to Observable.asFlow() now that you have switched to Kotlin 1.4? How about Flow.asObservable()?

frist008 commented 4 years ago

@frist008 have you by any chance switched back to Observable.asFlow() now that you have switched to Kotlin 1.4? How about Flow.asObservable()?

We weren't using RX explicitly in our source code.