Nvidia container-runtime API for GPU allocation

Co-authored-by: Monirul Islam Revives: https://github.com/bottlerocket-os/bottlerocket/pull/3994

Description of changes: This PR will expose two new APIs that will allow customer to configure value of accept-nvidia-visible-devices-as-volume-mounts and accept-nvidia-visible-devices-envvar-when-unprivileged for nvidia container runtime.

We introduce the default behavior to inject Nvidia GPUs using volume-mounts(https://github.com/bottlerocket-os/bottlerocket/pull/3718). This PR is to allow the users to opt-in to the previous behavior that allows unprivileged pods to have access to all GPUs when NVIDIA_VISIBLE_DEVICES=all is enabled and make both behavior configurable.

Bottlerocket Settings	Impact	Value	What it means?
`settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts`	allows to change the `accept-nvidia-visible-devices-as-volume-mounts` value for k8s container-toolkit	`true` \| `false` default: `true`	Adjusting the `visible-devices-as-volume-mounts` settings will alters the method of GPU detection and integration within container environments. Setting this parameter to `true` enables the NVIDIA runtime to recognize GPU devices listed in the `NVIDIA_VISIBLE_DEVICES` environment variable and mount them as volumes, which permits applications within the container to interact with and leverage the GPUs as if they were local resources.
`settings.kubernetes.nvidia.container-runtime.visible-devices-envvar-when-unprivileged`	allows to set value of `accept-nvidia-visible-devices-envvar-when-unprivileged` settings of nvidia container runtime for k8s varient	`true` \| `false` default: `false`	When this setting is set to `false`, it prevents unprivileged containers from accessing all GPU devices on the host by default. If `NVIDIA_VISIBLE_DEVICES` is set to `all` within the container images and `visible-devices-envvar-when-unprivileged` is set to true, all GPUs on the host will be accessible to the containers, regardless of the limits set via nvidia.com/gpu. This could lead to situations where more GPUs are allocated to a pod than intended, which can affect resource scheduling and isolation.

Testing done:

[x] Functional Test

Built an AMI for nvidia variant. Verify the settings gets picked up with default value.

$ apiclient get settings.kubernetes.nvidia.container-runtime
{
"settings": {
"kubernetes": {
  "nvidia": {
    "container-runtime": {
      "visible-devices-as-volume-mounts": true,
      "visible-devices-envvar-when-unprivileged": false
    }
  }
}
}
}

Opt-in the previous behavior to allow unprivileged nvidia device access.

$ apiclient set settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts=false
$ apiclient set settings.kubernetes.nvidia.container-runtime.visible-devices-envvar-when-unprivileged=true
$ apiclient get settings.kubernetes.nvidia.container-runtime
{
"settings": {
"kubernetes": {
  "nvidia": {
    "container-runtime": {
      "visible-devices-as-volume-mounts": false,
      "visible-devices-envvar-when-unprivileged": true
    }
  }
}
}
}

Verify the nvidia-container-runtime config exists


$ cat /etc/nvidia-container-runtime/config.toml
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false

[nvidia-container-cli] root = "/" path = "/usr/bin/nvidia-container-cli" environment = [] ldconfig = "@/sbin/ldconfig"



- [x] Migration Test
Tested migration from 1.20.1 to new version.
Tested migration back to 1.20.1. 

**Terms of contribution:**

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

bottlerocket-os / bottlerocket

Nvidia container-runtime API for GPU allocation #4052