NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.55k stars 586 forks source link

Access NVIDIA GPUs in K8s in a non-privileged container #605

Open pintohutch opened 3 months ago

pintohutch commented 3 months ago

Hello - I'm trying to see if it's possible to deploy NVIDIA DCGM on K8s with the securityContext.privileged field set to false for security reasons.

I was able to get this working by setting the container's resource requests as the following:

          resources:
            requests:
              nvidia.com/gpu: "1"
            limits:
              nvidia.com/gpu: "1"
          securityContext:
            capabilities:
              add:
                - SYS_ADMIN
              drop:
                - ALL

However, this is not ideal for a few reasons:

  1. We sacrifice an entire GPU just for monitoring, which is an over-allocation as DCGM does not need the full GPU compute capacity.
  2. This prevents other workloads from using an expensive resource.
  3. The Kubernetes scheduler will only allocate pods on nodes with excess GPU capacity.
  4. The container only seems to have access to one GPU device instead of all of the devices available on the node.

Is there any way to permit the container device access without reserving more of the resource requests via nvida.com/gpu?

Thanks for any help you can provide.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

pintohutch commented 2 weeks ago

Hey @elezar - I see that you're assigned to this. Is this feasible in any way that you know of?