OSC / ondemand

Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
https://openondemand.org/
MIT License
295 stars 107 forks source link

gpu grafana panel #3622

Open johrstrom opened 5 months ago

johrstrom commented 5 months ago

Someone on discourse is asking for gpu panel support in Activejobs grafana integration. https://discourse.openondemand.org/t/grafana-in-ood-ability-to-embed-other-panels/3575

Given the rise of AI and GPU demand and so on, we should likely support this case, even if many or most jobs don't use GPUs.

treydock commented 5 months ago

I will note that the ability to tie a given GPU to a job is something very specific to OSC and not coming from the exporters like the one NVIDIA provides. We have a job prolog that writes out a static metric in such a way that we can use PromQL to tie a job to the GPUs assigned to that job. The tools from the community and NVIDIA can only monitor the whole node's GPUs.

johrstrom commented 5 months ago

Thanks. I've commented on the discourse asking if they can provide the way they do it. Maybe there's some documentation we could add for the same.

johrstrom commented 5 months ago

Don't know if you're following the discourse post, but it seems they built an exporter: https://github.com/plazonic/nvidia_gpu_prometheus_exporter/tree/master with some additional prologue and epilogue stuff.

treydock commented 5 months ago

Ah custom exporter, we just use the one from NVIDIA: https://github.com/NVIDIA/dcgm-exporter.

Our prolog script:

if [ "x${CUDA_VISIBLE_DEVICES}" != "x" ]; then
  GPU_INFO_PROM=${METRICS_DIR}/slurm_job_gpu_info-${SLURM_JOB_ID}.prom
  cat > $GPU_INFO_PROM.$$ <<EOF
# HELP slurm_job_gpu_info GPU Assigned to a SLURM job
# TYPE slurm_job_gpu_info gauge
EOF

  OIFS=$IFS
  IFS=','
  for gpu in $CUDA_VISIBLE_DEVICES ; do
    echo "slurm_job_gpu_info{jobid=\"${SLURM_JOB_ID}\",gpu=\"${gpu}\"} 1" >> $GPU_INFO_PROM.$$
  done
  IFS=$OIFS

  /bin/mv -f $GPU_INFO_PROM.$$ $GPU_INFO_PROM
fi

exit 0

The metrics are written to a location that is picked up by node exporter.

Epilog:

GPU_INFO_PROM=${METRICS_DIR}/slurm_job_gpu_info-${SLURM_JOB_ID}.prom
rm -f $GPU_INFO_PROM

exit 0
treydock commented 5 months ago

PromQL from our dashboards that tie a job to a given GPU:

DCGM_FI_DEV_GPU_UTIL{cluster="$cluster",host=~"$host"} * ON(host,gpu) slurm_job_gpu_info{jobid="$jobid"}
DCGM_FI_DEV_MEM_COPY_UTIL{cluster="$cluster",host=~"$host"} * ON(host,gpu) slurm_job_gpu_info{jobid="$jobid"}
DCGM_FI_DEV_FB_USED{cluster="$cluster",host=~"$host"} * ON(host,gpu) slurm_job_gpu_info{jobid="$jobid"}
max((DCGM_FI_DEV_FB_FREE{cluster="$cluster",host=~"$host"} + DCGM_FI_DEV_FB_USED{cluster="$cluster",host=~"$host"}) * ON(host,gpu) slurm_job_gpu_info{jobid="$jobid"}) by (cluster)

With how our Grafana integration works, we could integrate this now I think. The GPU panels are already part of the "OnDemand Clusters" dashboard we use for CPU and memory.

Screenshot 2024-06-22 at 11 32 49 AM

treydock commented 5 months ago

I think we'd just need some mechanism possibly to only show the GPU panels when the job is a GPU job. The schema for cluster YAML in OnDemand would just need to handle one or two more keys, maybe like:

cpu: 20
memory: 24
gpu-util: <num for panel>
gpu-mem: <num for panel>