Open alexnwang opened 1 year ago
It's also a long standing painpoint for me, but I need to think a bit more about this.
The job id thing is useful because it means that sacct
and squeue
information is directly relatable to the on disk files.
But it means that restarting is a pain.
The nicest way would be to modify the sbatch file itself so that you can run sbatch
several times on it.
One workaround would be to have a CLI submitit restart 102984
that would restart a previous submitit job file.
Yeah, I just setup my directories such that if I submitted a job using the exact same parameters again it'll run in the same dir. Running out of the same dir will just have it pick up where it left off and have another set of submitit files corresponding to the re-run.
I'm interested in re-using sbatch files to re-submit jobs that have crashed. However, the
.sh
SBATCH file and the.pkl
file all are tied to a singleSLURM_JOBID
. This makes re-using the.sh
file to re-launch a job infeasible.It'd be appreciate if there could be some way to relax this requirement and not have it tied to the JOBID.