NVIDIA / k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes
Apache License 2.0
264 stars 49 forks source link

Fix regression with supporting operator managed drivers #196

Closed klueska closed 2 weeks ago

klueska commented 2 weeks ago

Testing on a DGX-A100 node with operator managed driver:

export NVIDIA_CTK_PATH=/usr/local/nvidia/toolkit/nvidia-ctk
export NVIDIA_DRIVER_ROOT=/run/nvidia/driver
helm upgrade -i --create-namespace --namespace nvidia nvidia-dra-driver deployments/helm/k8s-dra-driver \
    ${NVIDIA_CTK_PATH:+--set nvidiaCtkPath=${NVIDIA_CTK_PATH}} \
    ${NVIDIA_DRIVER_ROOT:+--set nvidiaDriverRoot=${NVIDIA_DRIVER_ROOT}} \
    --wait
kubectl apply -f demo/specs/quickstart/gpu-test6.yaml
$ kubectl get pod -n gpu-test6
NAME                  READY   STATUS    RESTARTS   AGE
pod-9cc5685d7-2xd9j   1/1     Running   0          33s
pod-9cc5685d7-grn5p   1/1     Running   0          33s
pod-9cc5685d7-q958c   1/1     Running   0          33s
pod-9cc5685d7-zbs74   1/1     Running   0          33s