pyslurm was crashing when trying to get the value for gres_per_node from a job. I was able to track down the job, and it was created with this salloc command:
salloc -N 1 --gres=gpu
This works because slurm says the type and count are optional. If not specified, the count is a default value of 1.
However, the to_gres_dict function in cstr assumes that if the gres string isn't null, then it will have a string that contains ":". As a result, when it goes to access the splittled string, this error happens:
Traceback (most recent call last):
File "./slurm_jobs_to_graphite.py", line 283, in
get_data()
File "./slurm_jobs_to_graphite.py", line 118, in get_data
tres = job.gres_per_node
File "pyslurm/core/job/job.pyx", line 1141, in pyslurm.core.job.job.Job.gres_per_node.get
File "pyslurm/utils/cstr.pyx", line 229, in pyslurm.utils.cstr.to_gres_dict
IndexError: list index out of range
The line numbers might be a little off there, as I added some code for debugging....the line in cstr is this:
Details
Issue
pyslurm was crashing when trying to get the value for gres_per_node from a job. I was able to track down the job, and it was created with this salloc command:
salloc -N 1 --gres=gpu
This works because slurm says the type and count are optional. If not specified, the count is a default value of 1.
However, the to_gres_dict function in cstr assumes that if the gres string isn't null, then it will have a string that contains ":". As a result, when it goes to access the splittled string, this error happens:
Traceback (most recent call last): File "./slurm_jobs_to_graphite.py", line 283, in
get_data()
File "./slurm_jobs_to_graphite.py", line 118, in get_data
tres = job.gres_per_node
File "pyslurm/core/job/job.pyx", line 1141, in pyslurm.core.job.job.Job.gres_per_node.get
File "pyslurm/utils/cstr.pyx", line 229, in pyslurm.utils.cstr.to_gres_dict
IndexError: list index out of range
The line numbers might be a little off there, as I added some code for debugging....the line in cstr is this:
name, typ, cnt = gres_splitted[0], gres_splitted[1], 0
I printed out gres_splitted right before this line, and it had the value: ['gpu'] hence the index out of range.