Closed xman1979 closed 2 months ago
if we do "scontrol show job", we get the submission scripts pointed to the temporary submission file which got removed, e.g:
(jepa) [xiaodongma@rsccpu4035 xiaodongma]$ scontrol show job 4499193 JobId=4499203 JobName=xiaodongma ... Command=/home/xiaodongma/jepa-internal/xiaodongma/submission_file_e9c4eef46a24436b81d5213875f19d6c.sh ...
this can bring confusion to slurm ecosystem and make it hard to integration with other tooling that relies on parsing/post-inspecting the sbatch script.
This diff create the temporary submission file as a symlink to the moved submission file
after fix, we can see the submission file
scontrol show job 4499203 JobId=4499203 JobName=xiaodongma ... Command=/checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/.submission_file_bb581d4ec3954cd9a45aa7388ad6494e.sh ... (jepa) xiaodongma@xiaodongma-login-0:/checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8$ ll /checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/.submission_file_bb581d4ec3954cd9a45aa7388ad6494e.sh lrwxrwxrwx 1 xiaodongma fair_amaia_cw_video 101 Sep 17 17:48 /checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/.submission_file_bb581d4ec3954cd9a45aa7388ad6494e.sh -> /checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/job_1358522/1358522_submission.sh
I'd rather the submission file be hidden as you had initially proposed, to avoid messing up (too much) with the folder
Why making this change?
if we do "scontrol show job", we get the submission scripts pointed to the temporary submission file which got removed, e.g:
this can bring confusion to slurm ecosystem and make it hard to integration with other tooling that relies on parsing/post-inspecting the sbatch script.
Fix
This diff create the temporary submission file as a symlink to the moved submission file
Test
after fix, we can see the submission file