To allocate nodes in a specific cabinet, use the command:qsub -l select=tier0=x4407 pbs_submit_script.sh
To allocate nodes in a specific chassis, use the command:qsub -l select=tier1=x4407c2 pbs_submit_script.sh
Requesting a Single Node per Specified Cabinet:
Concatenate select statements as follows:l select=1:ncpus=208:tier0=x4001+1:ncpus=208:tier0=x4002+1:ncpus=208:tier0=x4003+...
Determining the Number of Available Nodes:
To find the number of free nodes for each cabinet on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c
To find the number of up nodes (free and in use) for each cabinet on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c
2. Advanced Node Selection Syntax
Example of Selecting a Specific Number of Cabinets:
Use a combination of select and place statements to group chunks of nodes.
Example command:qsub -l select=60+60+60+60 -l place=group=tier0 pbs_submit_script.sh
This command requests groups of 60 nodes, grouped by the tier0 resource.
3. Limitations and Considerations
Incomplete Cabinets: There may not be complete cabinets available, meaning requests for a full cabinet might never run.
Node Grouping: The group selection works only if the entire set of requested nodes fits the criteria.
System Interconnect Topology: When allocating nodes above a rack in size, it's generally advisable to spread out due to the characteristics of the interconnect topology (such as Dragonfly on Aurora).
Verifying Node Availability: Use the commands provided to check the availability of nodes in a chassis or cabinet before allocation.
4. Additional Useful Commands
Checking Nodes in Chassis:
To find the number of free nodes in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
To find the number of up nodes (free and in use) in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
To find the number of free nodes in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
to order by running and then waiting
qstat -Twas1 lustre_scaling
to sort by num nodes
| column -t | sort -k 6 -n
$ qstat -fxw
Check for comment field.
run_count if its increasing then its trying to offline nodes and bringing in new nodes
qstat -xwpau $USER
to show a list of recently submitted jobs and you can see the Elap Time vs Req'd Time
Nodes can have more than one status (down,offline is pretty common for instance), PBS will only show the first on the list in a summary view like
res
pbsnodes -avSj
you should also keep in mind that node statuses will matter,
pbsnodes -avSj
and
pbsnodes -l
will help a lot with that (the first shows job id with nodes and their status, the second shows nodes that are considered 'down' and are in a unusuable state.
so
qstat -was1 workq
will get you that info
for workq. Also
qstat -Qf workq
will show full details on the queue, and the resources_assigned.nodect entry will have how many nodes have jobs running on them
Providing my notes on some items to cover
1. Allocating Nodes in Specific Racks or Cabinets
Selecting Nodes in Specific Cabinets or Chassis:
qsub -l select=tier0=x4407 pbs_submit_script.sh
qsub -l select=tier1=x4407c2 pbs_submit_script.sh
Requesting a Single Node per Specified Cabinet:
l select=1:ncpus=208:tier0=x4001+1:ncpus=208:tier0=x4002+1:ncpus=208:tier0=x4003+...
Determining the Number of Available Nodes:
LustreApps
queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c
LustreApps
queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c
2. Advanced Node Selection Syntax
Example of Selecting a Specific Number of Cabinets:
select
andplace
statements to group chunks of nodes.qsub -l select=60+60+60+60 -l place=group=tier0 pbs_submit_script.sh
tier0
resource.3. Limitations and Considerations
4. Additional Useful Commands
Checking Nodes in Chassis:
To find the number of free nodes in each chassis on the
LustreApps
queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
To find the number of up nodes (free and in use) in each chassis on the
LustreApps
queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
To find the number of free nodes in each chassis on the
LustreApps
queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c
pbsnodes -avSj | awk '{ if ($2 = "free" ) print $1 "\t" $2 }'
you can select specific nodes via
-l select=host=x4703c2s3b0n0 qsub -l select=host=x1922c6s3b0n0+1:host=x1922c7s6b0n0 -q workq-route -l walltime=00:20:00 -l filesystems=gila -A Aurora_deployment -I
watch qstat -was1 workq to see what is queued up and about to begin in my queue
qstat -Twas1 lustre_scaling | column -t | sort -k 6 -n
to order by running and then waiting qstat -Twas1 lustre_scaling
to sort by num nodes | column -t | sort -k 6 -n
$ qstat -fxw
Check for comment field.
run_count if its increasing then its trying to offline nodes and bringing in new nodes
qstat -xwpau $USER to show a list of recently submitted jobs and you can see the Elap Time vs Req'd Time
Nodes can have more than one status (down,offline is pretty common for instance), PBS will only show the first on the list in a summary view like res pbsnodes -avSj
you should also keep in mind that node statuses will matter, pbsnodes -avSj
and
pbsnodes -l
will help a lot with that (the first shows job id with nodes and their status, the second shows nodes that are considered 'down' and are in a unusuable state.
so
qstat -was1 workq
will get you that info for workq. Also
qstat -Qf workq
will show full details on the queue, and the resources_assigned.nodect entry will have how many nodes have jobs running on them
qstat -fx 8997637.amn-0001
pbs_rstat - show reservation