Closed ericfranz closed 4 years ago
@ericfranz What information are you trying to get out of slurm? What does MoabShowqClient
provide that you would like to get out of SLURM?
If my read of code is correct you want general queue statistics as well as node info. The node info is from sinfo -N
. A few examples:
Allocated/Idle nodes:
$ sinfo -o '%A'
NODES(A/I)
110/207
Nodes Allocated/Idle/Other/Total:
$ sinfo -o '%F'
NODES(A/I/O/T)
97/220/13/330
Mainly look at sinfo
. The information about job statistics could come from parsing out squeue
or possibly partition info from sinfo
if you iterate over all partitions.
See #43 for a discussion on the value of this feature.
The table below provides equivalent Slurm cmds for each respective Moab/Torque cmd that we use: (In systemstatus, as far as I know, we only use pbsnodes and showq) I'll add more cmd equivalences for the sake of completeness. | Moab/Torque | Slurm |
---|---|---|
showq | squeue | |
pbsnodes | scontrol show node | |
qstat | squeue -j or scontrol show job | |
qhold | scontrol hold job | |
qrls | scontrol release job | |
qsub | sbatch/srun/salloc | |
xpbs | sview | |
qalter | scontrol update | |
qdel | scancel | |
showstart | squeue -o "%S" or squeue --start |
For qpeek
, Slurm updates the out/err files provided in real time, so there's no need for an equivalent cmd.
If you want to mimic node status from pbsnodes use sinfo. I use sinfo in Prometheus to collect GPU usage: https://github.com/treydock/prometheus-slurm-exporter/blob/osc/gpus.go#L114-L115
You might be able to get everything from sinfo if you are looking at node data. Don't use scontrol because you can not control the formatting of scontrol and thus it's not good for scripts to ingest. Commands like sinfo can have their formatting controlled.
For tracking information about running jobs, you only need to use squeue and format flags to look at TRES fields. Like tres-alloc which would contain like gres/gpu=2
for a job containing GPUs, but it could also look like gres/gpu:v100=2
if a user asks for specific type of GPU.
Skim the documentation at https://slurm.schedmd.com/ and see if there are commands that might provide the same information that is provided by
MoabShowqClient
(https://github.com/AweSim-OSC/osc-systemstatus/blob/488b0656c6dd569d046cc681481641e7c1fade68/lib/moab_showq_client.rb). If so we could probably spin up a simple Slurm version of the system status app.