dabeaz / curio

Good Curio!
Other
4.04k stars 243 forks source link

task cancellation communication #211

Closed goldcode closed 4 years ago

goldcode commented 7 years ago

Just had a thought about how tasks can communicate via exceptions with one another (using cancel and disable_cancellation) i have a use case where a task may need to be cancelled gracefully or mercilessly. here is how the future code could look like. don't know if it's even possible, just a thought...

class GracefulError(curio.CancelledError):
    pass

async def parent():
    kid_task = await curio.spawn(kid)
    print("We're leaving!")
    #either this
    await kid_task.cancel(GracefulError()) #<----futurecode
    #or this
    await kid_task.cancel()
    print('Leaving!')

async def kid():
    try:
        for building in ('skyscraper', 'Plattenbau', 'hut'):
            print(f'Building the {building} in Minecraft')
            # only shield on GracefulError exceptions
            async with disable_cancellation(GracefulError): #<----futurecode
                await curio.sleep(1)

    except GracefulError:
        print('Fine. at least my buildings stand.')
        raise
    except curio.CancelledError:
        print('Fine. its half-built...')
        raise
njsmith commented 7 years ago

Note that in your example, kid can never receive GracefulError at all, because cancellation can only happen at await calls.

There's some very relevant discussion happening at https://github.com/python-trio/trio/issues/147 though I'm afraid it doesn't come to any useful conclusions yet. Except that reliable graceful shutdown in modular programs is not trivial. (Note that "shielding" is trio's version of disable_cancellation.)

dabeaz commented 7 years ago

It's true that this won't quite work right since cancellation requests can only be delivered at await statements. However, you could probably make it work by adding an explicit check_cancellation() call into the code like this::

async def kid():
    try:
        for building in ('skyscraper', 'Plattenbau', 'hut'):
            print(f'Building the {building} in Minecraft')
            # only shield on GracefulError exceptions
            async with disable_cancellation(GracefulError): #<----futurecode
                await curio.sleep(1)
            await check_cancellation()

    except GracefulError:
        print('Fine. at least my buildings stand.')
        raise
    except curio.CancelledError:
        print('Fine. its half-built...')
        raise

There would need to be some additional work to make disable_cancellation() support a subset of possible exceptions. However, in principle it could be done.

I agree with Nathaniel that graceful shutdown is not easy. Curio doesn't really do anything beyond the delivery of the appropriate cancellation exceptions. Your code can take action on those as it sees fit, but it will require some amount of care to make sure everything cancels gracefully.

goldcode commented 7 years ago

my understanding of passing exceptions to coroutines is hazy. what i thought may happen is that kernel/scheduler 'pins' the GracefulError exception to the kid task instead of coro.throw() when it may have arrived at the await curio.sleep(1) await point and finally later in the __exit__ function of the disable_cancellation context manager, the pinned graceful exception would be raised.

njsmith commented 7 years ago

@goldcode: that would make sense – in fact, I'm pretty sure that's how I did it when I first implemented disable_cancellation :-). But if you look at the code now, there aren't any awaits to act as potential cancellation points.

(Minor editorial comment: this ongoing difficulty in figuring out where cancellation points are in curio was the straw that made me switch to writing trio.)

There's another challenge to handling multiple "cancellation types", which is how to adjudicate between them if several are used at the same time. Like, if you do task.cancel(), and then before the TaskCancelled can be delivered, another piece of code does task.cancel(Graceful error) on the same task, then what happens? Does the graceful cancellation overwrite the regular cancellation so it gets lost? That seems wrong. What if the calls happen in the other order? Do you need an ordering on exception types, so curio knows that TasjCancelled is "stronger" than GracefulError, no matter which order they happen in?

dabeaz commented 7 years ago

For various reasons, cancellation exceptions are not raised in the __exit__() method of context managers. They are deferred to the next blocking operation. That check_cancellation() function will immediately raise any pending cancellation exception right at that point however.

Regarding cancellation in Curio generally, it can theoretically happen on any operation involving an await--especially if you're calling any kind of subroutine or library function. At the lowest level of the kernel, it will only happen on operations that actually block though.

As for a task being cancelled from two places--don't do that.

goldcode commented 7 years ago

@njsmith w.r.t cancelling a task multiple times before the task can even respond. interesting.

i don't know what curio currently would do, but the behavior should be deterministic, (e.g. like warn or throw an exception on a second cancel attempt to avoid any subtle bugs.) This scenario is a statistical eventuality (bug waiting to happen) even if it's not explicitly wished by the programmer to cancel twice.

dabeaz commented 7 years ago

I've pushed a partial change that at least allows a custom exception to be raised via Task.cancel(). I'm less certain about the idea of filtering exceptions in disable/enable cancellation calls. I'd need to think about that a bit more. Might be possible though.

Regarding second cancellation--the current behavior of cancellation is that whatever cancellation request was received first is what gets processed and delivered. If, for whatever reason, multiple tasks were to attempt cancellation on the same task, the second request would basically be ignored. Both cancellation requests, however, would block waiting for the task to terminate.

I suppose an exception or warning could be issued if an attempt was made to cancel a task with a different kind of exception than one that was already in progress.