Saving the state? - Githubissues

automl / HpBandSter

a distributed Hyperband implementation on Steroids

BSD 3-Clause "New" or "Revised" License

609 stars 109 forks source link

Then the whole optimization run will crash. You will be able to resume it, if you logged the intermediate results. Resuming here means that the master can build the same model as before, but running jobs from any workers will not be recovered. In case you ask that because you want to run everything on a cluster where there is a fairly strict time limit for any jobs, I recommend running the master either on the login node or some other machine that is reachable from the compute nodes. Usually, the master doesn't crash. We hat runs over several days, up to two weeks I think, without any major problems.

automl / HpBandSter

Saving the state? #56