Closed gadmm closed 2 weeks ago
Thanks for your questions. Here are a few, if not answers, at least comments, on these points.
I suspect that there is a lot to unpack in this statement, and it might be valuable to write down what this discipline is.
The preamble here and the documentation of the exception touch on this.
Yes this is simply Thread.Exit
and/or a form of asynchronous exception. In my opinion it is the proper way to go to implement cancellation (no explicit threading of tokens and a notion of cancellation that aligns with the scope of functions which makes it easier to understand it's scope which is not the case for tokens which effectively allow to build complex and difficult to understand side effecting cancellation effects).
Given that your scheduler is implemented in "user" space, there's of course all sorts of business clients can do to derail the whole scheme. But that's about as much as you can do without cooperation from the runtime system/language.
If you go through the fiber termination function some of the cases where "something went" wrong are explicitely commented. While it does not do anything about these cases for now we could of course treat some of these cases as "programming error" or warn the end user if we had some kind of runtime system log.
- what if you defined some generic retry function that retries on all exceptions (it’ll need to ignore the canceled exception otherwise your computation won’t actually finish when the timeout is reached).
Catch all exception handler are catch all exception handlers… Depending on what you do in your fiber function, you might eventually terminate "normally" with a value at which point it will finish it's abortion process and discard the value.
- what if your cleanup function takes a long time?
The abort procedure will take a long time.
- what if your cleanup functions raise exceptions, where do those go?
Following Fun.protect
they must not and are treated the same way, see this code. Alternatively if we had a good design for a global exception trap (see https://github.com/ocaml/ocaml/issues/11074) we could give them to the trap – like Affect
currently does in a few cases, e.g. with fibers that terminate by uncaught exception, see also point 2. here for more on this behaviour.
- This needs some way to mark regions of code as not cancelable,
This is basically the block
/unblock
scopes of asynchronous exceptions mentioned in the above paper. This actually looks doable by enriching the fiber's Running
state with a boolean indicating abortion should be delayed in the current state. Appropriate scoping combinators that toggle that variable (via an effect since the scheduler would be in charge) in the fiber's state could then be provided.
One issue that async Rust programmers have met is having to call async operations during cleanup.
In affect, if you try to cooperatively block a fiber (i.e. call an async operation) while it's in the aborting state it immediately aborts the blocking operation, that is calls the abort handler of the block operation you tried and raise Abort
again. Basically this means that an aborting function can't perform cooperatively blocking operation anymore, they get immediately aborted.
This I don't see as an issue with (do you ?). I'm sure system interaction APIs can be used that cope with this – if only by using the abort handler function of the blocking operation in a suitable manner.
Can cleanup operations be aborted? If so, should
Fun.protect
be somehow aware of theAbort
exception and not wrap it intoFinally_raised
?
Indeed there's an issue here.
Assuming a non aborting fiber yielded
or blocked in a finally
and gets aborted at that point, because of the discipline that you should not raise, you should swallow the Abort
exception. But then you kill the abortion process (amusingly you do not if the abort happens outside a finally
but your finally
raises Abort
, that one gets swallowed but the finally does the re-raise of the Abort your are treating).
Lacking this you should not yield or cooperatively block in finally
functions for now (but is it a good idea anyways ?).
I meant to come back to this with feedback but I am bit short on time to have a closer look at it.
One thought is that once you record the cancellation statefully, then you need not rely on testing the exception value. So you could explicitly allow the idiom of wrapping the exception into another one (as with Finally_raised
). This is the choice taken by memprof-limits and eio. (Basically we all agree that the cancellable code is not arbitrary OCaml code and has to follow some discipline, but I believe that this discipline can be a bit more relaxed that you currently describe.)
Modulo this difference, I think that this is quite close in spirit to the memprof-limits/eio cancellation models. (Again, devil might be in details, and I did not have time to look further into the consequences of the differences in detail.)
The reason why I mentioned Rust's async Drop issue was not because their technical obstacles transpose to OCaml (I am not sure that they transpose), but because it seems to be strong empirical evidence that people sometimes need to cooperatively block during cleanup. (Is it a good idea though? For instance I tend to think that a file should not be flushed before closing if the cleanup follows from an exceptional condition, but that it is fine to flush before closing during normal cleanup, and that it is fine to react differently according to whether the cleanup is normal or due to a panic.)
I think the idea of not testing the exception value but relying on the cancellation state, and the evidence from Rust "async Drop" that it is sometimes useful to block during clean-up, answers (albeit indirectly) to some of your points. (Further mechanism specific to cancellation with memprof-limits is that a "swallowed" interrupt will arise again soon after—which you have—and you can mask interrupts, e.g. during clean-up—which you do not have.) Are there still some points in the discussion where I could try to bring further clarifications? If not, should I close this issue?
This is out of my head at the moment and I'm not sure what the conclusion is :-) But I'd like to go over this at some point before the release, the documentation does need clarification I think.
I'm not sure what the conclusion is
The conclusion is that I'm no longer convinced by what I proposed then.
I don't think that cancellation should enforce the return values of fibers. This made the fiber termination code quite involved, made await
s painful (option
values), forced to add a dubious concept to fibers (the finally
handler, since it was possible for a fiber to be cancelled before being ever scheduled) and disallowed cancelled fibers to return partially computed results. I think that neither of these are very natural.
In c9fc5df3695f4b62026fa45a7a62d9ebc541622d I devised a new concurrency model centered on parallel asynchronous function calls with the idea of minimizing the difference between synchronous and asynchronous function calls (and thus fully embrace exceptions rather than try to contain them to fibers in the previous model). In this model I switched to a so called (by Swift) "cooperative cancellation" model. It remains structured cancellation as before (if you cancel a fiber so are its current and future descendents) but what you do about the cancellation is up to you. Since cancellation is now a fiber "flag" rather than a proper state, we get much closer to regular function calls and the finally
handler can be dropped.
I'm not entirely happy with everything yet. Notably at the level of cancellation of blocking operation (for now I suggest to only raise if cancellation happens during blocking but not otherwise, I have the impression that it's not the best model to work with that leads to racy code). But I'm also not convinced by the the design of these blocking operations themselves yet, I'd like to get to something like CML's events. A new design round is needed, but not now :-)
However I'd say that the basic original issue is resolved in the new concurrency model. In particular:
One issue that async Rust programmers have met is having to call async operations during cleanup. Can cleanup operations be aborted? If so, should
Fun.protect
be somehow aware of theAbort
exception and not wrap it intoFinally_raised
?
This issue no longer exists since what a fiber does when it is cancelled is entirely up to it.
Thanks for raising the issue @gadmm.
At https://discuss.ocaml.org/t/affect-composable-concurrency-primitives-for-ocaml-5-0/9430 you propose to implement cancellation/abortion as follows:
I suspect that there is a lot to unpack in this statement, and it might be valuable to write down what this discipline is. It looks a lot like the return of the old
Thread.Exit
exception which used to cause issues because no such discipline was advised to the user (e.g. third-party code that swallows the exception; one can argue that this is wrong, but more legitimate code could for instance wrap the exception into another exception).I am curious about your point of view on the possible issues with cancellation described by @edwintorok (here) and how your discipline would go about it:
One issue that async Rust programmers have met is having to call async operations during cleanup. Can cleanup operations be aborted? If so, should
Fun.protect
be somehow aware of theAbort
exception and not wrap it intoFinally_raised
?I do not have more precise issues to point out right now, but since you called for discussions on the design, I think that this is an important design point to get right.