HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
946 stars 82 forks source link

Remove multicore worker if process crashed #677

Closed HenrikBengtsson closed 1 year ago

HenrikBengtsson commented 1 year ago

Issue

If a multicore future terminates the underlying forked R process, then it occupies one of the worker slots.

Example:

library(future)

plan(multicore, workers = 4)
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 4)

f <- future({ Sys.sleep(2) })
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 3)

v <- value(f)
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 4)

f <- future({ tools::pskill(Sys.getpid()) })
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 3)

res <- tryCatch({
  v <- value(f)
}, error = identity)
stopifnot(inherits(res, "FutureError"))
conditionMessage(res)

## [1] "Failed to retrieve the result of MulticoreFuture (<none>) 
## from the forked worker (on localhost; PID 1632517). Post-mortem
## diagnostic: No process exists with this PID, i.e. the forked 
## localhost worker is no longer alive"

stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 4)  ## FAIL; here we're stuck as 3

Suggestion

Detect when forked process is terminated (cf. post-mortem analysis), and remove the corresponding MulticoreFuture from the internal FutureRegistry to free up the slot.

This should be safe to do for multicore futures, because they're transient R processes.

HenrikBengtsson commented 1 year ago

Implemented; a "crashed" multicore future is now fully released making its "slot" available again.