BiBiServ / bibigrid

BiBiGrid is a tool for an easy cluster setup inside a cloud environment.
Apache License 2.0
11 stars 8 forks source link

Multiple workers with same name #511

Closed XaverStiensmeier closed 3 months ago

XaverStiensmeier commented 3 months ago

In the past we rarely encountered the issue that more than one worker with the same name might be started and that slurm forgets the newer worker.

The cause of this might be that Slurm is not handling the exceeding suspendTimeout well. It just assumes that the shutdown has worked and allows restarts from then on. This is just a suspicion for now.

XaverStiensmeier commented 3 months ago

the dev branch now allows you to set the suspendTimeout as well.