OleHolmNielsen / Slurm_tools

My tools for the Slurm HPC workload manager
GNU General Public License v3.0
425 stars 90 forks source link

total gpu usage with slurmacct #26

Open arnoldas500 opened 1 year ago

arnoldas500 commented 1 year ago

Hi,

I was wondering if you have a flag to get the total cpu and gpu usage with the slurmacct tool? Goal is to get the total cpu and gpu hours per month per partition.

Thank you

OleHolmNielsen commented 1 year ago

Hi, I'm sorry that I don't have a good idea about getting GPU accounting information from Slurm :-( Best regards, Ole

mtds commented 1 year ago

What about the following command (at least for GPUs):

sreport -tminper cluster utilization --tres="gres/gpu" start=2023-03-01T00:00:00

Output shows something like:

--------------------------------------------------------------------------------
  Cluster      TRES Name          Allocated               Down         PLND Down               Idle           Planned            Reported 
--------- -------------- ------------------ ------------------ ----------------- ------------------ ----------------- ------------------- 
 myCluster      gres/gpu   14591077(57.06%)     2282656(8.93%)          0(0.00%)    8699467(34.02%)          0(0.00%)   25573200(100.00%)

Combining CPU and GPU usage in one report may be possible but I am not sure if the numbers will be 'mixed up' too much.

arnoldas500 commented 1 year ago

The issue with the above report is that I cannot separate by partition or by node. I have wrote my own reporting tool to calculate GPU hours per node and per partition.

docwebhead commented 11 months ago

OleHolmNielsen, You've written some great utilities, and provided some excellent info to the slurm-users mailing list. Thanks! The one thing sreport does that slurmacct doesn't, is allow itself to be run as a non-root user, as long as the user has the admin role in the slurm db. Have you any suggestions for running slurmacct as a non-root user?

OleHolmNielsen commented 11 months ago

Hi, thanks for your nice comments! The slurmacct script actually uses the Slurm commands sreport and sacct to generate reports. How did you find that non-root users aren't allowed to use slurmacct? Please first make sure that the sreport and sacct commands are permitted for your non-root user.

docwebhead commented 11 months ago

Thanks for your reply! I saw that in the script, and was puzzled, because these guys could run sreport (and friends) without issues. The helpful message from the OS was "permission denied".

I never figured out why, but I got it working by throwing the users into the slurm group and granting rights to execute it in sudoers.

Got there the long way 'round, but at least I didn't (as one user suggested) resort to setuid! Thanks again for sharing your hard work and wisdom.