HenrikBengtsson / parallelly

R package: parallelly - Enhancing the 'parallel' Package
https://parallelly.futureverse.org
128 stars 7 forks source link

WISH: killNode() for 'cluster' #33

Closed HenrikBengtsson closed 1 year ago

HenrikBengtsson commented 3 years ago

Add S3 methods psKill() for cluster to send a signal to the cluster node R process, e.g.

> cl <- parallelly::makeClusterPSOCK(4L)
> isAlive(cl)
[1] TRUE TRUE TRUE TRUE
> psKill(cl[2:3], signal = tools::SIGINT)
> Sys.sleep(10)
> isAlive(cl)
[1] TRUE FALSE FALSE TRUE

For localhost workers, we can use tools::pskill(pid, signal = ...). For remote ones, we need to call pskill -SIG $PID over a remote connection.

This will allow us to terminate stalled workers and set up new ones, e.g.

> alive <- isAlive(cl)
> if (!all(alive)) cl <- c(cl[alive], makeClusterPSOCK(length(cl) - sum(!alive))) 
> isAlive(cl)
[1] TRUE TRUE TRUE TRUE
HenrikBengtsson commented 1 year ago

Implemented in develop branch;

> library(parallelly)
> cl <- makeClusterPSOCK(4L)
> isNodeAlive(cl)
[1] TRUE TRUE TRUE TRUE
> killNode(cl[2:3])
[1] TRUE TRUE
> isNodeAlive(cl)
[1]  TRUE FALSE FALSE  TRUE
> killNode(cl)
[1]  TRUE FALSE FALSE  TRUE
> isNodeAlive(cl)
[1] FALSE FALSE FALSE FALSE
HenrikBengtsson commented 1 year ago

BTW, this requires that we have collected the process ID:s for the workers, which is only done by makeClusterPSOCK(), but not by parallel;

> cl <- parallel::makePSOCKcluster(4L)
TRACKER: .GlobalEnv changed: 1 variable added ('.Last.condition')
> isNodeAlive(cl)
[1] NA NA NA NA
> killNode(cl)
[1] NA NA NA NA
Warning messages:
1: In killNode.default(X[[i]], ...) :
  killNode() is not supported for 'SOCKnode' objects. Signal 15 was not sent
Calls: killNode ... killNode.cluster -> vapply -> FUN -> killNode.default
2: In killNode.default(X[[i]], ...) :
  killNode() is not supported for 'SOCKnode' objects. Signal 15 was not sent
Calls: killNode ... killNode.cluster -> vapply -> FUN -> killNode.default
3: In killNode.default(X[[i]], ...) :
  killNode() is not supported for 'SOCKnode' objects. Signal 15 was not sent
Calls: killNode ... killNode.cluster -> vapply -> FUN -> killNode.default
4: In killNode.default(X[[i]], ...) :
  killNode() is not supported for 'SOCKnode' objects. Signal 15 was not sent
Calls: killNode ... killNode.cluster -> vapply -> FUN -> killNode.default