PySlurm / pyslurm

Python Interface to Slurm
https://pyslurm.github.io
GNU General Public License v2.0
467 stars 116 forks source link

Cannot seem to get allocated gres info from Node #315

Closed tazend closed 11 months ago

tazend commented 11 months ago

@robgics

Discussed in https://github.com/orgs/PySlurm/discussions/314

Originally posted by **robgics** August 25, 2023 I'm using pyslurm.Nodes.load() to get a list of all nodes...that works fine. And I can print out a lot of things from them. But I cannot yet seem to get an accurate measure of used gres (gpus). I started 3 jobs each requesting 2 gpus each. I can see from squeue that the jobs are running, and if I use "scontrol show node" on the node that a job started on, I can see in the AllocTRES output that I am using the 2 gpus. I can also use "scontrol --details show node < nodename>" and that will give me GresUsed. However, when I output all of the nodes from pyslurm, specifically the node.allocated_gres, it shows an empty dict. I note that in the code itself, allocated_gres references "self.info.gres_used", so if that's the same gres_used that is in the scontrol output, then something is wrong with how pyslurm gets that value. I also notice that tres_configured and tres_alloc are commented out in the Node class def. Thanks for the help.
tazend commented 11 months ago

@robgics

in the allocated_gres property, could you try and insert the following before the return statement:

print(cstr.to_unicode(self.info.gres_used))

then recompile and see what it says (and post it here)? So it will print out the literal string that is in node_info_t for gres_used that another function uses to create a propert dict from it. If something is in the string, then probably that cstr.gres_to_dict function is faulty.

tazend commented 11 months ago

Found the error. There was a flag missing (slurm.SHOW_DETAIL) that is required to have this gres_used string contain any info. I'll make a fix for it tomorrow

robgics commented 11 months ago

Thanks for finding it so quick. Woulda taken me days or more.

robgics commented 10 months ago

Allocated gres and configured gres are coming out now, thanks!