smart restart via slurm

Acellera / htmd

HTMD: Programming Environment for Molecular Discovery

https://software.acellera.com/docs/latest/htmd/index.html

Other

253 stars 58 forks source link

smart restart via slurm #1024

Closed alejandrovr closed 2 years ago

alejandrovr commented 2 years ago

ACEMD cannot restart trajectories on a GPU different to the one where the simulation first started. Can we figure a way to automatically detect on which GPU it started and send the "restart job" to a suitable GPU via slurm?

stefdoerr commented 2 years ago

The solution is for ACEMD3 to implement restarts which don't depend on the GPU they ran on. I think there is a plan for that @raimis ?

stefdoerr commented 2 years ago

It's not related to HTMD. It's also not related with jobqueues library because that one cannot determine either on what SLURM node the job ran before. This must be fixed in acemd3