Closed ThibaultGROUEIX closed 3 years ago
Hi @ThibaultGROUEIX, thanks for raising this!
I think the command that's failing is squeue -O tres-per-node,nodelist,username,jobid --noheader
called here (and it seems the tres-per-node
option documented here is failing). Unfortunately the current implementation uses this flag to count the number of GPUs being used, so it might be a little tricky to work around.
Would you mind posting the version of SLURM you are using, so I can update the README to warn others about this problem? For reference, the version I test things on is slurm 18.08.7
.
Thanks!
Hello @albanie, I'm having the same problem here. My SLURM version is 17.02.11-Bull.1.1
Hi @albanie,
Sorry for the late reply, I didn't see your answer before I reran into this problem, looked for a fix, and find my own issue again^^
My version is slurm 17.02.7
Best regards
I am getting the same message in 17.11.2. Could it be that GPU resources are not tracked since there is nothing along the lines of AccountingStorageTRES=gres/gpu,gres/gpu:tesla
in the slurm.conf?
See docs
Edit:
Realized it is probably just due to old slurm.. Will try to update SLURM at some point.
Edit2: Seems to work on slurm 19.
Thanks both - I will update the README to reflect that there are issues on older versions.
I will close this for now, because I don't have a way to debug (I sadly don't have access to older SLURM versions for development) - but feel free to re-open if it's useful to discuss further.
Thanks for the authors for sharing this amazing tool! We can make a simple change to support SLURM 17. See https://github.com/yuhui-zh15/slurm_gpustat/commit/b4814b31b4ee036a0548b7212307741d4b8b71a6. You can try pip install git+https://github.com/yuhui-zh15/slurm_gpustat.git
.
Thanks for useful tool! This shows when
slurm_gpustat
is called. Cheers