In the past we rarely encountered the issue that more than one worker with the same name might be started and that slurm forgets the newer worker.
The cause of this might be that Slurm is not handling the exceeding suspendTimeout well. It just assumes that the shutdown has worked and allows restarts from then on. This is just a suspicion for now.
In the past we rarely encountered the issue that more than one worker with the same name might be started and that slurm forgets the newer worker.
The cause of this might be that Slurm is not handling the exceeding suspendTimeout well. It just assumes that the shutdown has worked and allows restarts from then on. This is just a suspicion for now.