Description of changes:
This PR will expose two new APIs that will allow customer to configure value of accept-nvidia-visible-devices-as-volume-mounts and accept-nvidia-visible-devices-envvar-when-unprivileged for nvidia container runtime.
We introduce the default behavior to inject Nvidia GPUs using volume-mounts(https://github.com/bottlerocket-os/bottlerocket/pull/3718). This PR is to allow the users to opt-in to the previous behavior that allows unprivileged pods to have access to all GPUs when NVIDIA_VISIBLE_DEVICES=all is enabled and make both behavior configurable.
allows to change the accept-nvidia-visible-devices-as-volume-mounts value for k8s container-toolkit
true | false default: true
Adjusting the visible-devices-as-volume-mounts settings will alters the method of GPU detection and integration within container environments. Setting this parameter to true enables the NVIDIA runtime to recognize GPU devices listed in the NVIDIA_VISIBLE_DEVICES environment variable and mount them as volumes, which permits applications within the container to interact with and leverage the GPUs as if they were local resources.
allows to set value of accept-nvidia-visible-devices-envvar-when-unprivileged settings of nvidia container runtime for k8s varient
true | false default: false
When this setting is set to false, it prevents unprivileged containers from accessing all GPU devices on the host by default. If NVIDIA_VISIBLE_DEVICES is set to all within the container images and visible-devices-envvar-when-unprivileged is set to true, all GPUs on the host will be accessible to the containers, regardless of the limits set via nvidia.com/gpu. This could lead to situations where more GPUs are allocated to a pod than intended, which can affect resource scheduling and isolation.
Testing done:
[x] Functional Test
Built an AMI for nvidia variant. Verify the settings gets picked up with default value.
- [x] Migration Test
Tested migration from 1.20.1 to new version.
Tested migration back to 1.20.1.
**Terms of contribution:**
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.
Co-authored-by: Monirul Islam Revives: https://github.com/bottlerocket-os/bottlerocket/pull/3994
Description of changes: This PR will expose two new APIs that will allow customer to configure value of
accept-nvidia-visible-devices-as-volume-mounts
andaccept-nvidia-visible-devices-envvar-when-unprivileged
for nvidia container runtime.We introduce the default behavior to inject Nvidia GPUs using volume-mounts(https://github.com/bottlerocket-os/bottlerocket/pull/3718). This PR is to allow the users to opt-in to the previous behavior that allows unprivileged pods to have access to all GPUs when
NVIDIA_VISIBLE_DEVICES=all
is enabled and make both behavior configurable.settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts
accept-nvidia-visible-devices-as-volume-mounts
value for k8s container-toolkittrue
|false
default:true
visible-devices-as-volume-mounts
settings will alters the method of GPU detection and integration within container environments. Setting this parameter totrue
enables the NVIDIA runtime to recognize GPU devices listed in theNVIDIA_VISIBLE_DEVICES
environment variable and mount them as volumes, which permits applications within the container to interact with and leverage the GPUs as if they were local resources.settings.kubernetes.nvidia.container-runtime.visible-devices-envvar-when-unprivileged
accept-nvidia-visible-devices-envvar-when-unprivileged
settings of nvidia container runtime for k8s varienttrue
|false
default:false
false
, it prevents unprivileged containers from accessing all GPU devices on the host by default. IfNVIDIA_VISIBLE_DEVICES
is set toall
within the container images andvisible-devices-envvar-when-unprivileged
is set to true, all GPUs on the host will be accessible to the containers, regardless of the limits set via nvidia.com/gpu. This could lead to situations where more GPUs are allocated to a pod than intended, which can affect resource scheduling and isolation.Testing done:
Built an AMI for nvidia variant. Verify the settings gets picked up with default value.
Opt-in the previous behavior to allow unprivileged nvidia device access.
Verify the
nvidia-container-runtime
config exists[nvidia-container-cli] root = "/" path = "/usr/bin/nvidia-container-cli" environment = [] ldconfig = "@/sbin/ldconfig"