NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.25k stars 238 forks source link

Ubuntu 24.04 Image Missing For nvidia-driver-daemonset #722

Open isugimpy opened 1 month ago

isugimpy commented 1 month ago

1. Quick Debug Information

2. Issue or feature description

Installed gpu-operator on a cluster hosting Ubuntu 24.04 nodes. Unable to install drivers because nvcr.io/nvidia/driver:550.54.15-ubuntu24.04 does not exist.

3. Steps to reproduce the issue

  1. Launch an Ubuntu 24.04 node with a GPU in a cluster.
  2. Install gpu-operator.
  3. Observe ImagePullBackOff due to image not existing.
frittentheke commented 2 weeks ago

I just observed the same issue.

While Ubuntu 24.04 is not yet listed as supported OS (https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html#supported-operating-systems-and-kubernetes-platforms) for the Operator, support for Ubuntu 24.04 has come in GRID 17.2 (https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-ubuntu/index.html#abstract)

But that apparently is NOT the case for AI Enterprise 5.0 (https://docs.nvidia.com/ai-enterprise/5.0/release-notes/index.html // https://docs.nvidia.com/ai-enterprise/5.0/product-support-matrix/index.html#support-matrix__ubuntu), even though there are references over to GRID 17.x.

Also there seem to be no driver containers for ubuntui24.04 being built yet - https://gitlab.com/nvidia/container-images/driver

@shivamerla could you clarify which level of support there currently is (should be) for Ubuntu 24.04?

isugimpy commented 2 weeks ago

Corresponding issue I opened on the container-images repo. https://gitlab.com/nvidia/container-images/cuda/-/issues/226

masterkain commented 1 day ago

from the linked issue: IMHO do not expect ubuntu:24.04-based images before 2025.

insert sadface here