cluster users should never (rarely and when letting the group know) submit more than 32 jobs at a time per person so that we are not monopolizing nodes (ctbrowngrp has 96 and no one person should use more than 1/3 of those), but as I do an squeue, I see some names coming up what appear to be hundreds of times.
My response:
Right - those are the only rules we suggest for the ‘ctbrowngrp’ shared buy-in, which is what we’re all using.
However, the squeue command by default reports for all users, many of whom use their own queues, their own buy-ins, etc.
The command:
squeue -A ctbrowngrp
will show you all of the jobs being run by our shared group, and you’ll see that tereiter and ntpierce have an awful lot of jobs, but mots of those are on bml and low2, which shouldn’t interfere with hmm and med2 jobs - you can screen those out like so,
squeue -A ctbrowngrp | egrep -v 'bml|low2'
which shows just the jobs running on medium and high priority nodes within our reservation.
The short version is this:
if you are having trouble getting medium or high priority jobs to run, and:
you then look at the output of the squeue/egrep command above and see that there are very few medium/high priority jobs queued and running in ctbrowngrp,
then there is another problem that we should look into.
If you look into the squeue/egrep command above and see that there are LOTS of medium/high priority jobs queued, then we can go politely remonstrate with people who are breaking the ctbrowngrp-specific rules.
a colleague asked:
My response:
Right - those are the only rules we suggest for the ‘ctbrowngrp’ shared buy-in, which is what we’re all using.
However, the
squeue
command by default reports for all users, many of whom use their own queues, their own buy-ins, etc.The command:
will show you all of the jobs being run by our shared group, and you’ll see that tereiter and ntpierce have an awful lot of jobs, but mots of those are on bml and low2, which shouldn’t interfere with hmm and med2 jobs - you can screen those out like so,
which shows just the jobs running on medium and high priority nodes within our reservation.
The short version is this:
then there is another problem that we should look into.
If you look into the squeue/egrep command above and see that there are LOTS of medium/high priority jobs queued, then we can go politely remonstrate with people who are breaking the ctbrowngrp-specific rules.