PySlurm / pyslurm

Python Interface to Slurm
https://pyslurm.github.io
GNU General Public License v2.0
467 stars 116 forks source link

Submittion of a batch job will be failed when argument "work_dir" contains a "_" #294

Open Baohua-Chen opened 1 year ago

Baohua-Chen commented 1 year ago

Details

UPDATE

This bug seems not only caused by values of slurm_job dict. I have got the same error when deleted the "work_dir" from the dict. Maybe it's something to submit job in Jupyter Lab? I do not know.

Issue

When attempting to submit a batch job using the job().submit_batch_job function and specifying a "workdir" key with values containing underscores (), the job gets submitted but immediately fails. Upon checking the submitted job using the job().find_id function, I discovered that the "work_dir" attribute was encoded as garbled text such as "wly�U". However, when I resubmitted the job with the underscores removed from the workdir`, the issue did not reoccur. I suspect this might be due to replacing "\" by "-" when call the SLURM interface.

An example which reproduces this bug: Job1 = {'wrap': 'echo a;sleep 15; echo b, 'job_name': 'test', 'partition': 'all', 'ntasks': 1, 'cpus_per_task': 1, 'work_dir': '/home/boo/slurm_jobs'} job().submit_batch_job(Job1)

And an example which works well: Job2 = {'wrap': 'echo a;sleep 15; echo b, 'job_name': 'test', 'partition': 'all', 'ntasks': 1, 'cpus_per_task': 1, 'work_dir': '/home/boo/slurmjobs'} job().submit_batch_job(Job2)

tazend commented 1 year ago

Hi

you are probably seeing a similar issue as mentioned in #260

In newer versions of pyslurm (starting with 21.08), the Job-Submission API was substantially reworked (see the docs here), and the pyslurm.job class has been declared deprecated.

Since that new API is not available for 20.2 yet, I can try to backport it. But it may take some time due to potential changes that have been introduced over the years in newer slurm versions.