Open HenrikBengtsson opened 4 years ago
This is the last obstacle for me to implement efficient cancellable promises and truly asynchronous Shiny reactives on top of future
.
I currently resorted to not interrupting cancelled futures, but just marking them as cancelled and ignoring their return values / conditions, and increasing the number of workers.
I also have an implementation on top of later
/callr::r_bg
, spawning one process per computation, but that is very slow even though I can properly interrupt expired computations.
PS: I replaced my homemade later
/callr::r_bg
implementation with future.callr
. Much simpler, but still as slow.
PS: I replaced my homemade
later
/callr::r_bg
implementation withfuture.callr
. Much simpler, but still as slow.
FWIW, note that in the next version of future.callr, future.callr::callr
will join multicore
in automatically releasing the worker slot if, and only if, the framework identifies that the worker has terminated/crashed. Those two backends were low-hanging fruits, mainly because the worker processes are transient. It might be possible to do something like this for other future backends as well, but I will move forward on those slowly and with great care, as explained in https://www.jottr.org/2023/07/01/parallelly-managing-workers/.
Note that this issue focuses on protecting against user interrupts occurring in the main R session. Hopefully, there is little need for protecting against user interrupts signaled to the worker processes.
PS: I replaced my homemade
later
/callr::r_bg
implementation withfuture.callr
. Much simpler, but still as slow.
Actually with future.callr
backend and px$interrupt()
I get a load of these:
Unhandled promise error: CallrFuture (<none>) failed. The reason reported was ‘! callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed’. Post-mortem diagnostic: The parallel worker (PID 65690) started at 2023-08-02T10:04:12+0000 finished with exit code 1. The total size of the 13 globals exported is 439.84 KiB. The three largest globals are ‘read_csv’ (170.23 KiB of class ‘function’), ‘read_delimited’ (146.92 KiB of class ‘function’) and ‘req’ (23.21 KiB of class ‘function’)
With px$kill()
the exit code is -9
:
Unhandled promise error: CallrFuture (<none>) failed. The reason reported was ‘! callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed’. Post-mortem diagnostic: The parallel worker (PID 68588) started at 2023-08-02T10:20:05+0000 finished with exit code -9. The total size of the 8 globals exported is 72.60 KiB. The three largest globals are ‘req’ (23.21 KiB of class ‘function’), ‘d’ (20.33 KiB of class ‘list’) and ‘dotloop’ (12.77 KiB of class ‘function’)
And future::nbrOfFreeWorkers()
never goes back up. I think I was tricked into thinking this was working because promises
has issues with duplicated promise errors (https://github.com/rstudio/promises/issues/86) and I had set workers = 100
which hid the problem.
FWIW, note that in the next version of future.callr, future.callr::callr will join multicore in automatically releasing the worker slot if, and only if, the framework identifies that the worker has terminated/crashed.
Indeed, the release of the workers works if one uses the latest commit of the develop
branch of future.callr
:
renv::install("https://github.com/HenrikBengtsson/future.callr/archive/a0db4c055629504049b4612b5e42cd5488fbd111.tar.gz")
~Is a new release planned for soonish?~ DONE, see https://github.com/HenrikBengtsson/future/discussions/695
Background
In interactive R sessions, the user can signal user interrupts by hitting Ctrl-C in the terminal. If this happens while R evaluates a set of R expressions that must all complete or not, there is a risk of breaking the state of a future. In some cases, we can recover from it whereas in others the only solution is to restart R.
Suggestion
In R (>= 3.5.0), we have
suspendInterrupts(expr)
which suspends user-interrupts with evaluating expressionexpr
.The first task is to identify places where they can safely protect against user interrupts without risking ending up in a situation where R completely blocks. We can always signal a SIGQUIT (Ctrl-\ in the terminal).
One obvious candidate is for cluster futures in main-worker communication. There should be no need to protect against user-interrupts on the worker's end.
Due to the risk of breaking something, we should probably introduce an R option
future.onInterrupts
and a corresponding environment variableR_FUTURE_ONINTERRUPTS
to allow users/sysadmins to enable or disable this feature. To minimize the introduced overhead from checking these all the time, it's probably better to just do it when the package is loaded, i.e. during.onLoad()
.We could start off by enabling these user-interrupt protections only for interactive R sessions.
See also