dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
The metrics API and dstack stats implemented in #1827 only collects metrics for Nvidia GPUs. Metrics for AMD GPUs should also be collected out-of-the-box. Unlike nvidia-smi that is always present in nvidia-supported Docker images, amd-smi may not be present. Still, it seems to be present in most production images, e.g. it's available in the TGI ROCM image.
The metrics API and
dstack stats
implemented in #1827 only collects metrics for Nvidia GPUs. Metrics for AMD GPUs should also be collected out-of-the-box. Unlikenvidia-smi
that is always present in nvidia-supported Docker images,amd-smi
may not be present. Still, it seems to be present in most production images, e.g. it's available in the TGI ROCM image.