NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.72k stars 280 forks source link

nvidia-settings and nvidia-xconfig not mounted to Pods #516

Open elgalu opened 1 year ago

elgalu commented 1 year ago

How can I configure the GPU Operator so that it automatically mounts the nvidia-settings nvidia-xconfig binaries?

The following binaries are automatically mounted from the host /run/nvidia/driver/usr/bin/* to the Pod by the GPU operator as overlays

/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-server
/usr/bin/nvidia-cuda-mps-control

Example mount:

overlay on /usr/bin/nvidia-smi type overlay (ro,nosuid,nodev,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1198/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1197/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1196/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1195/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1194/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1193/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1192/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1191/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1190/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1189/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1188/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1187/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1186/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1185/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1184/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1183/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1182/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1181/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1180/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/241/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/9964/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/9964/work)

However the following binaries are not, even when adding display and graphics to NVIDIA_DRIVER_CAPABILITIES

nvidia-settings
nvidia-xconfig

1. Quick Debug Checklist

elezar commented 1 year ago

Hi @elgalu.

This is not something that is currently supported, although we have it on our roadmap to improve the injection of Display-specific executables and libraries into the container.

Would you be in a position to test this before we may it generally available?

elgalu commented 1 year ago

Yes, we have a non production GPU cluster that we can use to test this.

On Wed, Apr 19, 2023, 20:55 Evan Lezar @.***> wrote:

Hi @elgalu https://github.com/elgalu.

This is not something that is currently supported, although we have it on our roadmap to improve the injection of Display-specific executables and libraries into the container.

Would you be in a position to test this is we may it available?

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/gpu-operator/issues/516#issuecomment-1515217653, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3HUO7KC5EY5QP475SVY3XCAYJ7ANCNFSM6AAAAAAXEIHYWU . You are receiving this because you were mentioned.Message ID: @.***>

kasisnu commented 5 months ago

Hello @elezar, also interested in this feature addition. + 1 Can also help test this with some workloads.