Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
12.99k stars 1.85k forks source link

Version 1.1.0 breaks referential equality of thrown exceptions #921

Closed pakoito closed 5 years ago

pakoito commented 5 years ago

When the flag kotlinx.coroutines.stacktrace.recovery is set to true, as per the default, exceptions thrown within the library are mutated or cloned so they lose referential equality properties.

We found this regression in Arrow when updating our adapter module to 1.1.0. We run a test suite that included some referential equality tests in this version. These referential equality tests were removed in the update process, as you can see in master. At least another library was affected, Vert.x, as stated here.

Even if this one fix could be applied to keep the Arrow adapter and Vert.x releases going, there is merit to the discussion of avoiding different behaviors between debug and production, and breaking a property that's respected by other concurrency libraries such as Project Reactor, RxJava, Scalaz, and Scala/Cats.

An alternative like Throwable#addSuppressed doesn't break referential equality, even if it mutates the exception. Defaulting to false and making the environment variable more visible in the documentation is another viable alternative :D

jacinpoz commented 5 years ago

I am having exactly the same issue. I rely on some error codes (int based) provided inside an Exception, and whatever the error code is originally, it always ends up being 0 once it has gone through "resumeWithException" inside my "suspendCoroutine" block. This change breaks all the error handling in my application.

pakoito commented 5 years ago

The latest info I have is from @qwwdfsad on Twitter:

It was a tradeoff between debuggability+clear stacktraces and referential transparency. The former is good for newbies and debugging asynchronous code, the latter for people who do equality checks on exceptions. This is why it is enabled only in debug mode and can be disabled.

Also, this mechanism does not mutate the original exception (as opposed to proposed addSuppressed mechanism, that mutates the exception asynchronously).

My assumptions about mutability were wrong, but it seems from your comment that this cloning behavior should at least be documented somewhere so people include errorcodes etc in the copies.

qwwdfsad commented 5 years ago

So let me further explain my point.

This behaviour is opt-out rather than opt-in to help newbies with exception handling. Making this behaviour opt-in is much less discoverable and useful. To reduce the impact of broken referential equality, it is enabled only in debug mode. I would be happy to provide readable stackraces without making reflective cloning, but I am not aware of other mechanisms of doing so except mutating real exception stacktrace (and this is much worse).

Even though we can live with the fact that addSuppressed mutates the original exception, it just does not help as it obfuscates resulting stacktrace, not improves it.

that's respected by other concurrency libraries

For example, ForkJoinPool does not preserve this invariant. Not talking about the fact that broken stacktraces and complicated debugging are one of the most common complains about Reactor and RxJava.

qwwdfsad commented 5 years ago

What we can do is to provide some way to indicate that exception is non-copyable or to instruct kotlinx.coroutines how to copy particular exception class.

We can either introduce a global settable state (global functions like disableCloning(exceptionClazz) etc.) or marker interfaces that exception class should implement. The former introduces a global settable state and may have conflicts between multiple libraries, but can be used with classes that you cannot change, while the latter is more intrusive for the application/library code.

I am open to discuss other solutions or concerns about proposed ones.

jacinpoz commented 5 years ago

I think option one should be a quick win.

The problem I see with having marker interfaces is that any third party libraries implementing exceptions classes won't necessarily abide to this rule, and therefore users of those exception classes will be forced to extend them (as long as they are not final) and copying them every time they need to be used.

The particular use case for me is AerospikeException, which basically ends up storing "0" in all its internal int attributes.

pakoito commented 5 years ago

This behaviour is opt-out rather than opt-in to help newbies with exception handling. Making this behaviour opt-in is much less discoverable and useful.

The period of being a newbie is short, while there are many medior/experts (and probably some of those newbies too) that'll get bitten by this issue and will need to go google it up, potentially hurting their product and losing valuable engineering time. Having it opt-in with documentation prevents this, and still helps newbies if instead you improve the discoverability and documentation of the parameter for them.

Not talking about the fact that broken stacktraces and complicated debugging are one of the most common complains about Reactor and RxJava.

Complicated debugging because it's noisy and it doesn't record thread jumps, not because there is anything broken about them nor they cause problems in client code. A completely different problem than the one described by this report.

We can either introduce a global settable state (global functions like disableCloning(exceptionClazz) etc.) or marker interfaces that exception class should implement. The former introduces a global settable state and may have conflicts between multiple libraries, but can be used with classes that you cannot change, while the latter is more intrusive for the application/library code.

With typeclasses as per KEEP-87 you could get the best of both :)

In this case it'd have to be the first to work for any exception because that's precisely the worst and most common case where you'll wish it existed, and have documentation somewhere about the tradeoff and the fix.

fvasco commented 5 years ago

@pakoito Type Classes have to be linked at build time, it cannot become a pluggable solution.

As alternative solution, it is possible to provide an external component to rebuild the right stack trace, so we does not have to reinstantiate the Throwable. Having this external mechanism, it is possible to log the right stack trace when coroutine fails, then throwing the original exception. This solution does not alter the execution flow and it fills the log with debugging information. Finally it is possible to provide something like fun Throwable.coroutineStackTrace() for developing purpose.

qwwdfsad commented 5 years ago

The particular use case for me is AerospikeException, which basically ends up storing "0" in all its internal int attributes.

Thanks or the use-case! This is a good point against marker interfaces.

The period of being a newbie is short, while there are many medior/experts (and probably some of those newbies too)

This is a debatable point, in my observations, most of the users remain "newbies" beyond basic public API surface. And automatic debuggability improvements boost productivity.

Complicated debugging because it's noisy and it doesn't record thread jumps

Exception cloning was done exactly to record threads (actually, coroutines) jumps. Typeclasses are indeed the best solution to this problem, but we are not expecting them in the nearest future.

I agree that debuggability should be properly documented, I am going to do it soon, including trade-offs and rationale behind these changes.

Having it opt-in with documentation prevents this, and still helps newbies

It does not work :) We already have experience with Kotlin flags itself (and Kotlin is much more popular than kotlinx.coroutines and its usage statistics is analyzable via statistics collection). Opt-in flags for behaviour desired for 99% of the users just does not work, nobody does that.

@fvasco what you are proposing is actually a magic pill "make it work as desired but without any changes in a previous behaviour". It is impossible to do so concurrently, safely and properly working in large parent-child hierarchies. Moreover, Throwable.coroutineStackTrace is hard to inject into the code you cannot change.

fvasco commented 5 years ago

@qwwdfsad

make it work as desired but without any changes in a previous behavior

I hope this is a shared goal :-)

It is impossible to do so concurrently, safely and properly working in large parent-child hierarchies.

Using the current implementation it should be possible, without create a new Exception. As a trivial implementation it is possible to use a WeakHashMap to store the Exception and its correct stack trace, this changes does not look impossible.

Moreover, Throwable.coroutineStackTrace is hard to inject into the code you cannot change.

Obvious, nor we have to request this to every developer.

I am considering the root problem: debug the code using a rich stack trace, so it may be enough print the stack when a coroutine crashes.

I try to explain my idea with an example

fun main(args: Array<String>) {
    try {
        runBlocking(Dispatchers.Default) {
            launch(CoroutineName("main")) { error("fail") }
        }
    } catch (e: Exception) {
        println("Program quit with error:")
        e.printStackTrace()
    }
}

So in debug this code should print two stack trace, the "right" one and the original one.

Coroutine "main" uncaught exception: java.lang.IllegalStateException: fail
    ...right stack trace

Program quit with error
java.lang.IllegalStateException: fail
    ...original stack trace
elizarov commented 5 years ago

@Ganet @qwwdfsad I think we can have smarter logic that would detect "non copyable" exceptions. Actually, I think any exception that stores additional state beyond the message, cause, etc (e.g. adds additional fields to the ones available in Throwable) should not be copied, since we would not know how to copy it. We should be able to detect the presence of this additional state fields via reflection.

elizarov commented 5 years ago

@pakoito Can you, please, clarify your use-case for identify comparison of exceptions? I see that your tests had started to fail, but that is not a use-case. I'm interested in an actual application-level use-case that would call for checking exceptions by their reference identity.

pakoito commented 5 years ago

This is not about comparing exceptions by reference as a feature. That's silly for anyone to do, we fixed it and that's it. Other people will still do it tho, and we both have to deal with it.

This is about a new set of subtle errors (what I called a footgun on twitter) introduced in an update that took a tradeoff to add a new feature turned on by default in a way that will mostly help newbies and confuse experts, and didn't even document it or announced it on the patchnotes.

@qwwdfsad defended the tradeoff, agreed that exceptions in other frameworks are not broken, accepted to add documentation for this new kx.coroutines-exclusive behavior, and with that my job here is done 😄

elizarov commented 5 years ago

@pakoito Thanks for your report. This feature is still experimental and the whole reason we've rolled it out was to gather feedback. It is indeed a tradeoff and we would have never even attempted to do it this way, but we've learned that Java's ForkJoinPool framework does something similar and that prompted us to try it, too: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/6be37bafb11a/src/share/classes/java/util/concurrent/ForkJoinTask.java#l547

mhernand40 commented 5 years ago

This also broke several unit tests of mine that test error cases where the underlying dependency is using an RxJava 2 Single that emits an error. The implementation ends up using the suspend await() extension and we test the propagation of the underlying exception by using referential equality.

Here is a sample repro (not actually one of my tests 🙂):

    @Test
    fun test() = runBlocking<Unit> {
        val exception = RuntimeException()
        val errorSingle = Single.error<String>(exception)

        var actualException: Exception? = null
        try {
            errorSingle.await()
        } catch (ex: Exception) {
            actualException = ex
        }

        assertThat(actualException).isEqualTo(exception) // Using AssertJ
    }

With the error being:

java.lang.AssertionError: 
Expecting:
 <"java.lang.RuntimeException (RuntimeException@2b5f4d54)">
to be equal to:
 <"java.lang.RuntimeException (RuntimeException@485e36bc)">
but was not.
qwwdfsad commented 5 years ago

Fixed in 1.2.0-alpha

pakoito commented 5 years ago

Thanks @qwwdfsad