Closed Ovec8hkin closed 4 years ago
Thank you @Ovec8hkin , yes that probably won't work on Comet, I am separating different job submission schemes based on platforms. give me some time to fix it
Ok. I figured it was just different submission scheme that hadn’t been tested yet. This is a low priority issue.
I think we can fix this problem for all possible schedulers with a simple regex command to pull out the job number: \d+
. This should match all numeric digits in the output string.
I believe replacing this code in job_submission.submit_single_job()
:
if scheduler == "LSF":
# works for 'Job <19490923> is submitted to queue <general>.\n'
job_number = output.decode("utf-8").split("\n")[1].split("<")[1].split(">")[0]
elif scheduler == "PBS":
# extracts number from '7319.eos\n'
# job_number = output.decode("utf-8").split("\n")[0].split(".")[0]
# uses '7319.eos\n'
job_number = output.decode("utf-8").split("\n")[0]
elif scheduler == 'SLURM':
try:
job_number = str(output).split("\\n")[-2].split(' ')[-1]
except:
job_number = job_num
with this:
import re
job_number = re.search(r'\d+', output).group(0)
I can probably test this soon.
I used regex as @Ovec8hkin suggested, it works almost for all cases except when you are in a compute node and call srun instead of sbatch. It is very rare to use this command and if we do, job_number= 99999 is only for checking outputs
When attempting to run the test code for
process_rsmas.py
on the SDSC Comet HPC cluster, the initial process_rsmas job doesn't get submitted properly as far as I can tell (no jobs are shown when runningsqueue -u $USER
).Below is what is printed to console:
Obviously
rsmas.job99999
is not the correct job number for the job.I believe this is tied to some code in
minsar.job_submission
; specifically, thesubmit_single_job
function. The following block of code look suspicious to me: