containers / toolbox

Tool for interactive command line environments on Linux
https://containertoolbx.org/
Apache License 2.0
2.38k stars 208 forks source link

cmd, pkg/nvidia: Enable the proprietary NVIDIA driver #1497

Closed debarshiray closed 3 weeks ago

debarshiray commented 1 month ago

This uses the NVIDIA Container Toolkit [1] to generate a Container Device Interface specification [2] on the host during enter and run commands. The specification is saved as JSON in the runtime directories at /run/toolbox or $XDG_RUNTIME_DIR/toolbox to make it available to the Toolbx container's entry point. The environment variables mentioned in the specification are directly passed to podman exec, while the hooks and mounts are handled by the entry point.

Toolbx containers already have access to all the devices in the host operating system's /dev, and containers share the kernel space driver with the host. So, this is only about making the user space driver available to the container. It's done by bind mounting the files mentioned in the generated CDI specification from the host to the container, and then updating the container's dynamic linker cache.

This neither depends on nvidia-ctk cdi generate to generate the Container Device Interface specification nor on podman create --device to consume it.

The main problem with nvidia-ctk and podman create is that the specification must be saved in /etc/cdi or /var/run/cdi, both of which require root access, for it to be visible to podman create --device. Toolbx containers are often used rootless, so requiring root privileges for hardware support, something that's not necessary on the host, will be a problem.

Secondly, updating the toolbox(1) binary won't let existing containers use the proprietary NVIDIA driver, because podman create only affects new containers.

Therefore, toolbox(1) uses the Go APIs used by nvidia-ctk cdi generate and podman create --device to generate, save, load and apply the CDI specification itself. This removes the need for root privileges due to /etc/cdi or /var/run/cdi, and makes driver avaiable to existing containers.

Based on an idea from Ievgen Popovych.

[1] https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/ https://github.com/NVIDIA/nvidia-container-toolkit

[2] https://github.com/cncf-tags/container-device-interface

https://github.com/containers/toolbox/issues/116

softwarefactory-project-zuul[bot] commented 1 month ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/1d461ea631cd4b67ab19f6a111cc4d73

:heavy_check_mark: unit-test SUCCESS in 6m 52s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 54s :heavy_check_mark: unit-test-restricted SUCCESS in 6m 02s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 39m 11s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 38s :heavy_check_mark: system-test-fedora-39 SUCCESS in 35m 21s :heavy_check_mark: system-test-fedora-38 SUCCESS in 34m 56s

softwarefactory-project-zuul[bot] commented 4 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/5c864fd8a81446d69e27e352742e2d5a

:heavy_check_mark: unit-test SUCCESS in 6m 14s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 14s :heavy_check_mark: unit-test-restricted SUCCESS in 4m 58s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 42m 47s :heavy_check_mark: system-test-fedora-40 SUCCESS in 41m 14s :heavy_check_mark: system-test-fedora-39 SUCCESS in 41m 49s :heavy_check_mark: system-test-fedora-38 SUCCESS in 40m 29s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/544425a9a24943ea954234c99a0c5c67

:heavy_check_mark: unit-test SUCCESS in 6m 54s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 06s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 41s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 37m 18s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 10s :heavy_check_mark: system-test-fedora-39 SUCCESS in 35m 22s :heavy_check_mark: system-test-fedora-38 SUCCESS in 34m 52s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/fd9dbbd9aa08413fa78e8dd3138d47a5

:heavy_check_mark: unit-test SUCCESS in 7m 14s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 07s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 46s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 39m 10s :heavy_check_mark: system-test-fedora-40 SUCCESS in 38m 39s :heavy_check_mark: system-test-fedora-39 SUCCESS in 38m 01s :heavy_check_mark: system-test-fedora-38 SUCCESS in 37m 48s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/ff83204c9bc74a6fbc5323b0dd5a1080

:heavy_check_mark: unit-test SUCCESS in 6m 58s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 13s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 45s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 38m 42s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 47s :heavy_check_mark: system-test-fedora-39 SUCCESS in 36m 36s :heavy_check_mark: system-test-fedora-38 SUCCESS in 35m 47s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/49a25f401133402ca51feea90601ab76

:heavy_check_mark: unit-test SUCCESS in 6m 47s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 05s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 54s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 40m 59s :heavy_check_mark: system-test-fedora-40 SUCCESS in 39m 24s :heavy_check_mark: system-test-fedora-39 SUCCESS in 39m 19s :heavy_check_mark: system-test-fedora-38 SUCCESS in 39m 26s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/257ef3548c854a5e89fe75ca2022bf2c

:heavy_check_mark: unit-test SUCCESS in 6m 58s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 13s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 46s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 39m 49s :heavy_check_mark: system-test-fedora-40 SUCCESS in 37m 31s :heavy_check_mark: system-test-fedora-39 SUCCESS in 37m 29s :heavy_check_mark: system-test-fedora-38 SUCCESS in 37m 03s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/b280128658e844c39c382ca73d82b28d

:heavy_check_mark: unit-test SUCCESS in 7m 54s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 4m 12s :heavy_check_mark: unit-test-restricted SUCCESS in 6m 26s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 41m 10s :heavy_check_mark: system-test-fedora-40 SUCCESS in 37m 16s :heavy_check_mark: system-test-fedora-39 SUCCESS in 36m 52s :heavy_check_mark: system-test-fedora-38 SUCCESS in 35m 10s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/5c989524bcc54ab6a0d5793779bd8bee

:heavy_check_mark: unit-test SUCCESS in 6m 40s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 4m 18s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 42s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 35m 59s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 09s :heavy_check_mark: system-test-fedora-39 SUCCESS in 34m 29s :heavy_check_mark: system-test-fedora-38 SUCCESS in 33m 55s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/abe42fce43ab40339ec76eb38db4dd27

:heavy_check_mark: unit-test SUCCESS in 6m 37s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 30s :heavy_check_mark: unit-test-restricted SUCCESS in 4m 34s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 37m 16s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 18s :heavy_check_mark: system-test-fedora-39 SUCCESS in 34m 54s :heavy_check_mark: system-test-fedora-38 SUCCESS in 34m 28s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/70f25909abf2496c915adcafc2a36d01

:heavy_check_mark: unit-test SUCCESS in 6m 57s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 38s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 48s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 38m 41s :heavy_check_mark: system-test-fedora-40 SUCCESS in 36m 58s :heavy_check_mark: system-test-fedora-39 SUCCESS in 37m 02s :heavy_check_mark: system-test-fedora-38 SUCCESS in 35m 54s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/0ecf85451512421f87380122a5d41a13

:heavy_check_mark: unit-test SUCCESS in 7m 10s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 09s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 50s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 37m 35s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 30s :heavy_check_mark: system-test-fedora-39 SUCCESS in 35m 38s :heavy_check_mark: system-test-fedora-38 SUCCESS in 35m 51s

debarshiray commented 3 weeks ago

recheck

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build failed. https://softwarefactory-project.io/zuul/t/local/buildset/820af67b2a9a45b4b7edf131e177b334

:x: unit-test RETRY_LIMIT in 32s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 09s :x: unit-test-restricted RETRY_LIMIT in 32s :x: system-test-fedora-rawhide RETRY_LIMIT in 31s :heavy_check_mark: system-test-fedora-40 SUCCESS in 36m 00s :heavy_check_mark: system-test-fedora-39 SUCCESS in 35m 36s :heavy_check_mark: system-test-fedora-38 SUCCESS in 35m 26s

debarshiray commented 3 weeks ago
TASK [Install RPM packages]
fedora-rawhide | ERROR
fedora-rawhide | {
fedora-rawhide |   "failures": [],
fedora-rawhide |   "msg": "Could not import the libdnf5 python module using /usr/bin/python3 (3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)]). Please install python3-libdnf5 package or ensure you have specified the correct ansible_python_interpreter. (attempted ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python2', '/usr/bin/python'])"
fedora-rawhide | }

I believe this is because the Zuul executor got updated from Ansible 2.13.7 to 2.15.10, which now has support for DNF5, and the DNF5 Change is now being aimed at Fedora 41 (and Rawhide). I am trying to fix the CI in https://github.com/containers/toolbox/pull/1509

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build failed. https://softwarefactory-project.io/zuul/t/local/buildset/dd04684764e441d1853a0145b23227e5

:x: unit-test RETRY_LIMIT in 35s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 28s :x: unit-test-restricted RETRY_LIMIT in 35s :x: system-test-fedora-rawhide RETRY_LIMIT in 35s :heavy_check_mark: system-test-fedora-40 SUCCESS in 36m 10s :heavy_check_mark: system-test-fedora-39 SUCCESS in 36m 48s :heavy_check_mark: system-test-fedora-38 SUCCESS in 34m 58s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/ea59a4e9208349c58d9fed5259ec2a85

:heavy_check_mark: unit-test SUCCESS in 6m 06s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 27s :heavy_check_mark: unit-test-restricted SUCCESS in 4m 50s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 37m 03s :heavy_check_mark: system-test-fedora-40 SUCCESS in 38m 03s :heavy_check_mark: system-test-fedora-39 SUCCESS in 38m 25s :heavy_check_mark: system-test-fedora-38 SUCCESS in 37m 30s

softwarefactory-project-zuul[bot] commented 3 weeks ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/20eb2584eab142dc94ce81333434db0b

:heavy_check_mark: unit-test SUCCESS in 6m 54s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 24s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 40s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 38m 16s :heavy_check_mark: system-test-fedora-40 SUCCESS in 35m 59s :heavy_check_mark: system-test-fedora-39 SUCCESS in 36m 54s :heavy_check_mark: system-test-fedora-38 SUCCESS in 35m 26s