Aurora job allocation doc: qsub examples to select nodes within slot/chassis/cabinet etc? examples of useful pbs qstat options

Providing my notes on some items to cover

1. Allocating Nodes in Specific Racks or Cabinets

Selecting Nodes in Specific Cabinets or Chassis:

To allocate nodes in a specific cabinet, use the command:qsub -l select=tier0=x4407 pbs_submit_script.sh
To allocate nodes in a specific chassis, use the command:qsub -l select=tier1=x4407c2 pbs_submit_script.sh

Requesting a Single Node per Specified Cabinet:

Concatenate select statements as follows:l select=1:ncpus=208:tier0=x4001+1:ncpus=208:tier0=x4002+1:ncpus=208:tier0=x4003+...

Determining the Number of Available Nodes:

To find the number of free nodes for each cabinet on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c
To find the number of up nodes (free and in use) for each cabinet on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c

2. Advanced Node Selection Syntax

Example of Selecting a Specific Number of Cabinets:

Use a combination of select and place statements to group chunks of nodes.
Example command:qsub -l select=60+60+60+60 -l place=group=tier0 pbs_submit_script.sh
This command requests groups of 60 nodes, grouped by the tier0 resource.

3. Limitations and Considerations

Incomplete Cabinets: There may not be complete cabinets available, meaning requests for a full cabinet might never run.
Node Grouping: The group selection works only if the entire set of requested nodes fits the criteria.
System Interconnect Topology: When allocating nodes above a rack in size, it's generally advisable to spread out due to the characteristics of the interconnect topology (such as Dragonfly on Aurora).
Verifying Node Availability: Use the commands provided to check the availability of nodes in a chassis or cabinet before allocation.

4. Additional Useful Commands

Checking Nodes in Chassis:

To find the number of free nodes in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
To find the number of up nodes (free and in use) in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
To find the number of free nodes in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c

pbsnodes -avSj | awk '{ if ($2 = "free" ) print $1 "\t" $2 }'

you can select specific nodes via

-l select=host=x4703c2s3b0n0 qsub -l select=host=x1922c6s3b0n0+1:host=x1922c7s6b0n0 -q workq-route -l walltime=00:20:00 -l filesystems=gila -A Aurora_deployment -I

watch qstat -was1 workq to see what is queued up and about to begin in my queue

qstat -Twas1 lustre_scaling | column -t | sort -k 6 -n

to order by running and then waiting qstat -Twas1 lustre_scaling

to sort by num nodes | column -t | sort -k 6 -n

$ qstat -fxw Check for comment field. run_count if its increasing then its trying to offline nodes and bringing in new nodes

qstat -xwpau $USER to show a list of recently submitted jobs and you can see the Elap Time vs Req'd Time

Nodes can have more than one status (down,offline is pretty common for instance), PBS will only show the first on the list in a summary view like res pbsnodes -avSj

you should also keep in mind that node statuses will matter, pbsnodes -avSj

and

pbsnodes -l

will help a lot with that (the first shows job id with nodes and their status, the second shows nodes that are considered 'down' and are in a unusuable state.

qstat -was1 workq

will get you that info for workq. Also

qstat -Qf workq

will show full details on the queue, and the resources_assigned.nodect entry will have how many nodes have jobs running on them

qstat -fx 8997637.amn-0001

pbs_rstat - show reservation

argonne-lcf / user-guides