frenchwr / slurm-wrappers

Python scripts for parsing SLURM commands
MIT License
1 stars 1 forks source link

qSummary works, show limits doesn't. #2

Open spikebike opened 7 years ago

spikebike commented 7 years ago

Not sure if this is meant for public consumption. But it looks really useful and I'd consider contributing.

qSummary works well, shows 13 groups, 40 or so users, and provides into on about 40k jobs in our queue.

./showlimits shows: [root@nas-9-0 bin]# ./showLimits ACCOUNT GROUP FAIRSHARE MAXCPUS MAXMEM(GB) MAXCPUTIME(HRS)

Joegrp 1 - - - Traceback (most recent call last): File "./showLimits", line 91, in main() File "./showLimits", line 88, in main print_summary(accounts,groups,mbs_and_mins) File "./showLimits", line 51, in print_summary while ( group_list[group_cnt][0] == account[0] ): IndexError: list index out of range

I see that it calls: [root@nas-9-0.agri bin]# sacctmgr show accounts format=organization%30,account%30,user,fairshare,GrpCPUs,GrpMem,GrpCPURunMins WithAssoc --noheader --parsable2 | wc -l 1655

No errors, or strangeness there that I can see. Ah, our slurm setup has accounts, and users. But no accounts under other accounts, what you call groups. I haven't seen that used before.

Ah, from what I can tell you call a "group" what slurm considers a subaccount. From the docs: Accounts may be arranged in a hierarchical fashion, for example accounts chemistry and physics may be children of the account science.

We don't have those, thus the crash. Would be nice if we could say something like "showLimits -g all" and get a report on the limits for all accounts. Ah, never mind, we don't use account/group limits (like grpcpu and grpmem) because they have to be the same across all partitions. We switched to QoS to allow limits for accounts to differ by partition.

Great scripts, just wanted you to know that your limit script dies if you don't have subaccounts.

frenchwr commented 7 years ago

Hi @spikebike, thanks for the feedback.

Not sure if this is meant for public consumption. But it looks really useful and I'd consider contributing.

I just went ahead and added a MIT license to this repo. Feel free to use and modify the scripts as you wish. If you modify/enhance the scripts, I would greatly appreciate a pull request. 😄

I've thought about contributing and might do so in the future. qSummary would be useful for a lot of sites, showLimits I'm not so sure about. I think a lot of sites limit resources differently than ours (as you mention below), via either QOS's, partition limits, etc. Another useful enhancement would be to add support for more resource limits (not just grpcpus, grpmem, and grpcpurunmins) and allow users to request the limits they want output from the command line.

qSummary works well, shows 13 groups, 40 or so users, and provides into on about 40k jobs in our queue.

Great.

./showlimits shows: [root@nas-9-0 bin]# ./showLimits ACCOUNT GROUP FAIRSHARE MAXCPUS MAXMEM(GB) MAXCPUTIME(HRS) Joegrp 1 - - - Traceback (most recent call last): File "./showLimits", line 91, in main() File "./showLimits", line 88, in main print_summary(accounts,groups,mbs_and_mins) File "./showLimits", line 51, in print_summary while ( group_list[group_cnt][0] == account[0] ): IndexError: list index out of range

I see that it calls: [root@nas-9-0.agri bin]# sacctmgr show accounts format=organization%30,account%30,user,fairshare,GrpCPUs,GrpMem,GrpCPURunMins WithAssoc --noheader --parsable2 | wc -l 1655

No errors, or strangeness there that I can see. Ah, our slurm setup has accounts, and users. But no accounts under other accounts, what you call groups. I haven't seen that used before.

Ah, from what I can tell you call a "group" what slurm considers a subaccount. From the docs: Accounts may be arranged in a hierarchical fashion, for example accounts chemistry and physics may be children of the account science.

Yeah, you've got it. What I call a "group" is actually a subaccount in SLURM. We use this language at our center (https://github.com/accre) for historical reasons that predate the use of SLURM. I realize the script dies if you don't define subaccounts, which probably limits its applicability to other sites. It would probably be relatively straightforward to enhance the script to overcome this. If you want to take a stab at it I would gladly consider a PR!

We don't have those, thus the crash. Would be nice if we could say something like "showLimits -g all" and get a report on the limits for all accounts. Ah, never mind, we don't use account/group limits (like grpcpu and grpmem) because they have to be the same across all partitions. We switched to QoS to allow limits for accounts to differ by partition.

showLimits shows all account limits by default, if you don't pass it any CL args. And yeah, like I said above, this script would probably be of limited utility at sites other than ours since so many use partition-level QOS's and so on.

Great scripts, just wanted you to know that your limit script dies if you don't have subaccounts.

Thanks again, I appreciate the feedback!