NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.69k stars 611 forks source link

Running device plugin with mixed mode MIG without SYS_ADMIN #916

Open vishnukarthikl opened 4 weeks ago

vishnukarthikl commented 4 weeks ago

Hello all, I am evaluating whether the device plugin can be run without SYS_ADMIN capabilities for mixed mode MIG. Currently the capability is needed to query the MIG slice's memory info. But this also increases the security surface area of the Pod and I am considering if we can reduce it.

Based on @klueska comment, it seems possible to pass the capabilities directly into the container without having to explicitly add SYS_ADMIN. I tried to bind mount the host's /proc/driver/nvidia/capabilities/mig/monitor into container but running into pod error. Using a build from release-0.13

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-gpu-dp-daemonset
  namespace: $namespace
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      gpu-device-plugin: nvidia
  template:
    metadata:
      labels:
        gpu-device-plugin: nvidia
    spec:
      containers:
      - args:
        - --fail-on-init-error=false
        - --mig-strategy=mixed
        - --pass-device-specs=true
        env:
        - name: CUDA_DEVICE_ORDER
          value: PCI_BUS_ID
        - name: NVIDIA_MIG_MONITOR_DEVICES
          value: all
        image: gcr.io/$project/nvidia/k8s-device-plugin:v0.13.0
        imagePullPolicy: IfNotPresent
        name: nvidia-device-plugin-ctr
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/kubelet/device-plugins
          name: device-plugin
        - name: nvidia-mig-monitor
          mountPath: /proc/driver/nvidia/capabilities/mig/monitor
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/lib/kubelet/device-plugins
          type: ""
        name: device-plugin
      - name: nvidia-mig-monitor
        hostPath:
          path: /proc/driver/nvidia/capabilities/mig/monitor
  Warning  Failed     13s (x5 over 110s)   kubelet            Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/proc/driver/nvidia/capabilities/mig/monitor" to rootfs at "/proc/driver/nvidia/capabilities/mig/monitor": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/proc/driver/nvidia/capabilities/mig/monitor" cannot be mounted because it is inside /proc: unknown

Has anyone made this working? Any examples would definitely help.

Thanks

klueska commented 2 weeks ago

You would have to inject both /proc/driver/nvidia/capabilities/mig/monitor and /dev/nvidia-caps/nvidia-cap1

klueska commented 2 weeks ago

Though you may not be able to mount the /proc stuff directly (which shouldn't strictly be necessary). Try it with just the device node.