dib-lab / farm-notes

notes on the farm cluster
16 stars 9 forks source link

commands to examine ctbrowngrp queue #39

Open ctb opened 2 years ago

ctb commented 2 years ago

a colleague asked:

cluster users should never (rarely and when letting the group know) submit more than 32 jobs at a time per person so that we are not monopolizing nodes (ctbrowngrp has 96 and no one person should use more than 1/3 of those), but as I do an squeue, I see some names coming up what appear to be hundreds of times.

My response:

Right - those are the only rules we suggest for the ‘ctbrowngrp’ shared buy-in, which is what we’re all using.

However, the squeue command by default reports for all users, many of whom use their own queues, their own buy-ins, etc.

The command:

squeue -A ctbrowngrp

will show you all of the jobs being run by our shared group, and you’ll see that tereiter and ntpierce have an awful lot of jobs, but mots of those are on bml and low2, which shouldn’t interfere with hmm and med2 jobs - you can screen those out like so,

squeue -A ctbrowngrp | egrep -v 'bml|low2'

which shows just the jobs running on medium and high priority nodes within our reservation.

The short version is this:

then there is another problem that we should look into.

If you look into the squeue/egrep command above and see that there are LOTS of medium/high priority jobs queued, then we can go politely remonstrate with people who are breaking the ctbrowngrp-specific rules.