automl / HpBandSter

a distributed Hyperband implementation on Steroids
BSD 3-Clause "New" or "Revised" License
611 stars 109 forks source link

How to stop a worker #104

Open totifra opened 3 years ago

totifra commented 3 years ago

Hey there,

in a multi-worker run e.g. on a cluster using slurm, is there a good way to free some resources, i.e., stopping/killing a worker without losing the result of the parameter combination the worker was testing? Currently, if I just kill a worker, the result for the corresponding parameter combination would just be lost and the next free worker would not continue or restart the parameter combination of the killed worker. Is there a way to kill a worker and the next free worker would just restart or continue the job of the killed worker?

Thanks Thomas