Open ericphanson opened 3 years ago
Thinking about this slightly more, I think a nice "inversion of control" here is that the ideal pmap
could return workers to the pool (in fact, I think it already does), and the pool could decide to remove idle workers. (Perhaps the pool would wait a minute or two and then if they are still idle, rm
them).
This isn't strictly a K8sClusterManagers.jl issue, but @omus pointed me here :).
I was running hyperparameter optimization on a model using
@phyperopt
from Hyperopt.jl withpmap=Parallelism.robust_pmap
from Parallelism.jl. I would spin up the desired number of workers withaddprocs
, then essentially callpmap
via these abstractions, and then that's it. When thepmap
is done, the manager writes out a summary and exits, and all the processors are released.I wanted to train 20 models this way quickly, so I did this with 20 workers and left them to train. However, some finished much faster than others, and those processors were left idling. Since this is via k8s, if we killed them, we could have in-scaled and saved lots of resources.
It would be great to have something like
pmap
that could automatically remove processors when they were no longer needed.