dabeaz / curio

Good Curio!
Other
4.02k stars 241 forks source link

Document exceptions raised by Kernel.run #242

Closed Nikratio closed 6 years ago

Nikratio commented 6 years ago

It seems that curio.run() may raise a TaskError. This is currently not documented, the documentation for Kernel.run does not mention any exceptions, and the documentation for TaskError states that this is only raised by a taks's join() method.

It would be nice to document the current behavior.

It would be even nicer (in my opinion) to also just re-raise the original exception. I don't see much point in wrapping it in a TaskError. In particular, when unit-testing a program that uses curio curoutines, this makes working with the tests rather awkward.

dabeaz commented 6 years ago

Error handling is already tricky enough because exceptions could be due to a bug in Curio itself or a bug in the coroutine that Curio is running. The purpose of the TaskError is to make it absolutely clear about the origin of an error. If it's a TaskError, then it's a bug in the user-supplied coroutine. If it's some other kind of exception, it's a bug in Curio. I don't foresee changing this.

njsmith commented 6 years ago

FWIW, Trio does this the other way -- if it's a regular error, it comes from the user-supplied code (whether that indicates a bug or not). If it's a TrioInternalError, then it comes from Trio.

dabeaz commented 6 years ago

I get it, "For Humans" and all of that. The problem is that humans are going to screw up exception handling. For one, they're lazy and prone to wrapping things with a diaper exception:

t = await spawn(something_with_http_and_database_and_crypto_and_cloud)
try:
    result = await t.join()
except:
     print("Welp. Shrug. ")

Or they're forgetful and don't check exceptions at all:

t = await spawn(something_with_http_and_database_and_crypto_and_cloud)
result = await t.join()

Or they screw up something with layering of exceptions:

async def spam():
        return await timeout_after(5, something_with_http_and_database_and_crypto_and_cloud)

t = await spawn(spam)
try:
    result = await timeout_after(20, spam)
except TaskTimeout:
     print("Which timeout?!?")

Spawning a task is not like a normal function call. A spawned task runs concurrently in its own context. The Curio join() method doesn't mix contexts when it comes to reporting exceptions. The spawned task either runs successfully or it doesn't. If it doesn't, there is a single exception, TaskError that gets raised by join() in response to that. If someone wants more information about what happened, they can inspect the exception cause. If any other kind of exception comes out of join(), it's either an exception related to the join() operation itself (e.g., cancellation, timeout, etc.) or it's a bug in Curio. There is no ambiguity about what it is.

If someone doesn't want exceptions wrapped in a TaskError, they can always do this as well:

t = await spawn(some_task)
await t.wait()
try:
     r = t.result
except WhateverError:
     ...
Nikratio commented 6 years ago

I can't quite follow. Assuming people write code like in your first or second example, why does it matter if you wrap exceptions in TaskError or not?

Finally, I agree that spawning a task isn't a normal function call. But Kernel.run is a rather different beast than curio.spawn, because it blocks until execution of the task has completed. I think that difference is big enough to justify progating exceptions as-is.

dabeaz commented 6 years ago

Spawning a new task is not the same as calling a normal function or calling a coroutine with await. It involves creating a new execution context, scheduling and concurrent execution in the kernel. There's significantly more complexity and subtle interactions involved. Managing that complexity is a concern. Wrapping task-related errors with TaskError is one way of managing it because it's a way of maintaining separation of errors from different execution contexts and maintaining task isolation.

For one, if you spawn a task, it will either complete successfully, or it will crash. For the purposes of join(), any kind of crash is folded into a TaskError. There are no other options. Because there are no other options, you always write code like this if you need to account for task failure:

 try:
       result = await t.join()
except TaskError as e:
       ...

This provides a very strong disincentive to write catch-all exception handlers and it provides a much more precise mechanism for dealing with errors that occurs in the execution of the task as opposed to other possible errors related to the task that did the spawning in the first place (e.g., timeouts, cancellation, etc.).

I also feel that TaskError helps pinpoint the source of unexpected errors. Suppose someone writes code that doesn't check for task-related errors:

async def spam():
         t = await spawn(sometask)
         result = await t.join()
         ...

I would claim that spawning a task and not properly checking for the possibility of it crashing is a programming error. However, suppose this function is deeply embedded in some other calls:

async def blah():
        await spam()

async def grok():
        await blah()

If someone calls await grok(), it's not at all apparent that somewhere deep inside, a new task was spawned for concurrent execution and that it's not properly being checked. However, it will become immediately apparent if it ever crashes--a TaskError will bubble out and yell at you. The mere presence of that exception means that someone spawned a task and didn't handle its failure correctly.

In the bigger picture, I just don't like the idea of exceptions crossing task boundaries in an uncontrolled manner. If an exception occurs in a task, it should stay in that task. If it needs to cross over to another task for some reason, then it should be wrapped in a way to clearly indicate that the exception originated inside a different task than the one that's executing.

dabeaz commented 6 years ago

One thing that has me wondering though is that the run_in_thread() and run_in_process() functions behave differently--they return the actual exception that took place, not a wrapped exception. This is not justification for changing join()--there are definite practical concerns as to why that has to return a TaskError. However, I'll have to ponder it.

I should note that much of this has to do with Task.join() returning a result. You can still get unwrapped exceptions if you use Task.wait() and then look at Task.result afterwards.

Nikratio commented 6 years ago

Not sure why you keep talking about Task.join()... I am talking about Kernel.run().

dabeaz commented 6 years ago

My apologies for fixating on Task.join(). Mainly, the same issues that affect join() also affect run() in that it too can raise exceptions that are not directly related to the execution of the task itself. Because of that, it also needs to wrap exceptions. If someone wants the raw exceptions of the task itself, one could always write a wrapper around run() to reraise the task exception.

def my_run(*args, **kwargs):
      try:
            return run(*args, **kwargs)
      except TaskError as e:
            raise e.___cause__ from None

(or something similar)

xgid commented 6 years ago

Hi there,

I'm following closely all the good work you, @njsmith and others are doing at curio and trio, not only because concurrency is an important issue to master but because I try to learn from those who know more in programming. And there's a point you made which I fully agree with:

In the bigger picture, I just don't like the idea of exceptions crossing task boundaries in an uncontrolled manner.

But that took me to the question of how would you handle low level exceptions like TaskError in a sans-io library (having interchangeable io-backends like curio or trio) so as to not let the "implementation details" (like the io-backend in use) jump to the sans-io library? I suppose the my_run idea in your last comment would be a necessary strategy... or just the opposite: make all the backends return a backend-independent TaskError. What do you think?

Sorry for the slightly off-topic question. 😳

njsmith commented 6 years ago

"Sans-IO" doesn't usually refer to libraries with interchangeable backends – the prototypical examples like hyper-h2, h11, ssl.SSLObject work directly with bytes in memory and literally do not do any I/O at all. We don't have a word for libraries with interchangeable backends... though maybe we should, because the scope and issues are somewhat different. One of the challenges you run into is, yeah, that different libraries are genuinely different in how they handle things – even libraries like trio and curio that share a lot of ideas still have differences all over the place, and not just TaskError. (One fun one is that they disagree about which functions should be marked async, which can easily leak out to affect your whole API.) I think there is a role for libraries with interchangeable backends, but that it's mostly for libraries that only use the backend in a fairly trivial way. Like urllib3 seems to working OK, because all it cares about is opening/closing sockets and sending/receiving data on them. But as soon as you get into concurrency etc., I suspect in most cases you're better off picking a single backend and sticking to it. One thing Sans-IO libraries can give you though is to make it surprisingly easy to implement new backend-specific libraries.

dabeaz commented 6 years ago

Added some notes in docs about exceptions for run().

dabeaz commented 6 years ago

I'm going to reopen this because I've been looking at run() lately and it's too complicated. Maybe it could be changed.

dabeaz commented 6 years ago

Honestly, the only reason why Kernel.run() returns a wrapped exception is that it also allows a task timeout to be specified (and I need some way to distinguish between the raising of the timeout versus a normal Task related-error).

Looking at code, I'm inclined to get rid of the timeout feature altogether. Doing that, I could go back to normal exception handling behavior. A variety of other things would also be simplified. If someone really wanted a timeout given on a submitted task, they can already wrap it with timeout_after() anyways. For example:

Kernel.run(timeout_after(maxtime, coro))
xgid commented 6 years ago

But as soon as you get into concurrency etc., I suspect in most cases you're better off picking a single backend and sticking to it. One thing Sans-IO libraries can give you though is to make it surprisingly easy to implement new backend-specific libraries.

Thanks @njsmith for your detailed answer and advice. I'm sorry for not being able to answer before.

I thought I could build my application without adhering to any specific I/O library and making some "benchmarks" o stress tests later on to see which I/O library was giving me the best results... But it seems to be a too much ambitious goal. Your point about the differences in async usage between curio and trio it's been a demolishing argument for me. Many thanks for making it so clear!