PySlurm / pyslurm

Python Interface to Slurm
https://pyslurm.github.io
GNU General Public License v2.0
479 stars 117 forks source link

Particular value of gres is causing cstr to crash #333

Closed robgics closed 8 months ago

robgics commented 8 months ago

Details

Issue

pyslurm was crashing when trying to get the value for gres_per_node from a job. I was able to track down the job, and it was created with this salloc command:

salloc -N 1 --gres=gpu

This works because slurm says the type and count are optional. If not specified, the count is a default value of 1.

However, the to_gres_dict function in cstr assumes that if the gres string isn't null, then it will have a string that contains ":". As a result, when it goes to access the splittled string, this error happens:

Traceback (most recent call last): File "./slurm_jobs_to_graphite.py", line 283, in get_data() File "./slurm_jobs_to_graphite.py", line 118, in get_data tres = job.gres_per_node File "pyslurm/core/job/job.pyx", line 1141, in pyslurm.core.job.job.Job.gres_per_node.get File "pyslurm/utils/cstr.pyx", line 229, in pyslurm.utils.cstr.to_gres_dict IndexError: list index out of range

The line numbers might be a little off there, as I added some code for debugging....the line in cstr is this:

name, typ, cnt = gres_splitted[0], gres_splitted[1], 0

I printed out gres_splitted right before this line, and it had the value: ['gpu'] hence the index out of range.

tazend commented 8 months ago

Hi @robgics

thanks for reporting. It should be fixed now with #334 merged into main