JuliaParallel / Dagger.jl

A framework for out-of-core and parallel execution
Other
629 stars 67 forks source link

Add task/scheduler cancellation API #557

Closed jpsamaroo closed 3 weeks ago

jpsamaroo commented 1 month ago

It's a frequent situation where a task runs for a really long time, or just hangs (maybe due to a bug, or intentionally), and we just want to stop the task and move on with life. You might think that using Ctrl+C is the right way to do this, but you'll find that with Julia (and many other languages) that this frequently does not do what you want, and is just as likely to hang or crash your Julia process. This is because the request to "cancel" some running code isn't targeted, and so Julia just interrupts whatever task is running currently, which is frequently not the task that you actually wanted to cancel.

This PR adds a new function, Dagger.cancel!, which allows for cancelling Dagger DTasks in a safe way. Unlike Ctrl+C, this doesn't force the underlying task to stop (that is generally considered unsafe and impossible to always do safely and in a timely manner), but instead just "abandons" the task and lets Dagger's runtime and scheduler move on to working on other queued tasks. This releases any calls to wait or fetch that were waiting on the cancelled DTask, and unblocks the processor queues so that other tasks may run.

It also provides a way to halt the scheduler and allow it to restart automatically, which can prove useful for automated testing and when certain kinds of hangs occur within the scheduler.

It's expected that this functionality will eventually be wired up to a smarter Ctrl-C, so that users can regain control of a seemingly unresponsive system, or to allow prototyping algorithms in the REPL which may run for a really long time.