Closed elizarov closed 5 years ago
I'm not sure I understand the solution *S3… resumeWith
is not called if the continuation is cancelled and the underlying callback is unregistered and its methods are never called? Or are you talking about resumeWith
in another place?
Let me clarify S3. Right now, when CancellableContinuation
is cancelled it immediately resumes the coroutine with CancellationException
, so now if the operation we were waiting for had already crashed and resumeWithException
is about to be called, it has no choice but to either ignore this exception or invoke uncaught exception handler.
With S3 the idea the we don't immediately resume a cancelled coroutine. Instead, we mark it as "cancelling" and wait until resumeXxx
is invoked, so that if resumeWithException
is invoked, we can resume the coroutine with this exception, so that we don't have this ugly choice between losing exception or handling it as uncaught one.
But that means the coroutine may be dangling in a cancelling state if resumeXxx is never called?
I mean, in OkHttp, if we have a request that is still in progress and
there's an outer cancellation before it miserably fails with a 404 error or
an IOException, invokeOnClose lambda with be called, which will call
cancel() on the Call
I think I misunderstood something, maybe we're not talking about the same resumeXxx call place?
On Wed, Nov 14, 2018, 4:27 PM Roman Elizarov notifications@github.com wrote:
Let me clarify S3. Right now, when CancellableContinuation is cancelled it immediately resumes the coroutine with CancellationException, so now if the operation we were waiting for had already crashed and resumeWithException is about to be called, it has no choice but to either ignore this exception or invoke uncaught exception handler.
With S3 the idea the we don't immediately resume a cancelled coroutine. Instead, we mark it as "cancelling" and wait until resumeXxx is invoked, so that if resumeWithException is invoked, we can resume the coroutine with this exception, so that we don't have this ugly choice between losing exception or handling it as uncaught one.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kotlin/kotlinx.coroutines/issues/830#issuecomment-438702179, or mute the thread https://github.com/notifications/unsubscribe-auth/AGpvBYMSZdFse0uV03LwglMj9jEtGLNdks5uvDZsgaJpZM4YdSMH .
@LouisCAD Yes, S3 works only for callback-based API that would still invoke a callback when you cancel. Retrofit and many other APIs are like that, but not all of them, so it cannot become the only solution out there.
@elizarov Intriguing, I didn't know some callback based APIs did that.
I'm wondering how the callback is invoked after call to cancel()
on Call<T>
in Retrofit2, since I see no mention of the behavior you're talking about in the javadoc. EDIT: there's a test for that.
@elizarov Since all my tests and beta works perfectly with W1 I think I'll go to prod with that solution.
Since it's internal, is there any risk that this workaround is no more working at some point before this new API is created, or is the risk low enough to be OK for prod?
@LouisCAD for example in apache http client FutureCallback has cancelled
listener.
@elizarov So went to prod with https://gist.github.com/Tolriq/dcbdc62c0e29d3b321034e990c3c85ce containing Workaround 1.
While it greatly reduce the problem (Catched exception re thrown) (no more seen in 7K user beta group) it still happens with a way larger active user base.
I then tested to add a CoroutineExceptionHandler in case the issue was somewhere else it did not fix anything.
I then tested Workaround 2 and it did not change anything.
I then stopped relying on Exceptions and encapsulated them in a sealed class
private suspend fun Call.await(): WorkerResult {
return suspendCancellableCoroutine { continuation ->
continuation.invokeOnCancellation {
cancel()
}
enqueue(object : Callback {
override fun onResponse(call: Call, response: Response) {
continuation.resume(WorkerResult.Success(response))
}
override fun onFailure(call: Call, e: IOException) {
if (call.isCanceled || !isActive) {
continuation.resume(WorkerResult.Error(CancellationException("Cancelled")))
} else {
continuation.resume(WorkerResult.Error(e))
}
}
})
}
}
And problem solved.
So I wonder where else in the code there's a race, maybe somewhere I the worker pool I use, or maybe inside couroutine or channels, but no idea where to search.
My expectation is that most APIs will still call the completion callback on a cancellation scenario, so S3 seems adequate for most cases (@LouisCAD regarding Retrofit2, the documentation is not clear, however there seems to be a test just for that case - https://github.com/square/retrofit/blob/master/retrofit/src/test/java/retrofit2/CallTest.java#L647)
IMO, there should also be a non-internal/experimental way of achieving the behaviour of W2 also for the success case, i.e., ignoring a cancellation if the callback is called with success.
I occur the same error. local scope async cannot catch error. https://stackoverflow.com/questions/50230466/kotlin-withcontext-vs-async-await so withContext can catch error.
bad code:
launch {
try {
val result = async(Dispatchers.IO) { //this line
}
}
to
correct code:
launch {
try {
val result = withContext(Dispatchers.IO) { //this line
}
}
I've played with various alternative APIs to avoid cancel/fail races in CancellableContinuation
and while it is possible to design and implement such API, it is invariable too fragile and hard to use correctly. So, the decision here is to simply ignore a racy exception that comes after cancel for a standard suspendCancellableCoroutine { ... }
which automatically makes all existing await()
implementations correct and race-free, without needing to apply any workaround.
We might provide a special fine-grained API in the future that would provide ability to detect and process racy exceptions if and only if there are compelling use-cases for such an API. We don't have any at the moment. So, this issue will be closed by PR #896.
Consider this attempt to integrate coroutines with cancellable callback-based API using
suspendCancellableCoroutine
. This example is for Retrofit, but the same issue could appear with any cancellable API:Resuming a
CancellableContinuation
with exception is documented to behave in the following way (see https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-cancellable-continuation/index.html):So, this code attempts to avoid calling
resumingWithException
on an already cancelled continuation by doing a check in LINE (1). This creates a "check & act" race in between LINE (1) and LINE (2). In particular, if continuation gets cancelled from a different thread when this code executes between LINE (1) and LINE (2), thenresumeWithException
is invoked on cancelled continuation and triggers handling of "uncaught" exception.There is no public (non-experimental, non-internal) API at the moment to work around this race.
Workarounds
W1. Use internal API on
CancellableContinuation
:Replace LINE (1) and LINE(2) with the following code:
This workaround ignores exceptions when continuation is cancelled and this is the disadvantage of this workaround at the same time. If a true exception (failure) happens concurrently with cancellation, then it is going to be ignored instead of being handled. Arguably this problem is much less severe, but still.
W2. Use internal API on
Job
:suspendCancellableCoroutine
withsuspendCoroutine
.cont.invokeOnCancellation { ... }
with the following code:Call
(to see if the call had failed or was cancelled) and resume continuation withCancellationException
in case of cancellation:This workaround correctly handles exceptions that occur concurrently with cancellation (the call is either going to be cancelled or fails and we learn what had happened).
Proposed solutions
There are several possible solutions:
S1. Ignore exceptions when resuming a cancelled continuation using resumeWithException. Technically, this is a breaking change, but that is not a major problem here. The problem is that there could be a genuine exception that needs to be handled and ignoring it in
resumeWithException
is as bad as any other instance of ignoring an exception.S2. Introduce new
resumeWithExceptionOrIgnoreOnCancel
method (name is purely provisional). It is not a breaking change, but it suffers from the same problem of ignoring a genuine exception when it happens concurrently with cancellation. This is not a severe problem, though, and this could still be one of the solutions.S3. New API for cancellable callbacks with an intermediate cancelling state. The idea is that invocation of
cancel()
should not immediately result in resumption of the suspended coroutine, but shall wait untilresumeWith
(value or exception) is invoked. It gives a chance to see whether the operation was indeed cancelled successfully or failed while we trying to cancel it. It requires writing somewhat more code, though, similar to the code described in the second workaround.suspendCancellableCoroutine
.