ECP-VeloC / VELOC

Very-Low Overhead Checkpointing System
http://veloc.rtfd.io
MIT License
52 stars 21 forks source link

SLURM restart-in-place script double counts down node #23

Closed CamStan closed 1 year ago

CamStan commented 5 years ago

When testing veloc_srun on SLURM, on back-to-back runs after a node was already taken down, the second run ended up double counting the same downed node in down_nodes.

Unfortunately I don't have the output from this test as it was done on a different machine.

bnicolae commented 1 year ago

This issue stayed inactive for a long time. Please reopen if still relevant.