containers / toolbox

Tool for interactive command line environments on Linux
https://containertoolbx.org/
Apache License 2.0
2.39k stars 208 forks source link

cmd/create: Support passing --device option to podman-create #1407

Closed Jmennius closed 1 month ago

Jmennius commented 7 months ago

This allows to use CDI infrastructure, which often does more then just mapping devices in /dev - for NVIDIA this will additionally map a set of libraries into container (which are essential to use the device without a hassle).

Since this is only a pass-through, maybe instead of having a --device specific option we should have an ability to pass arbitrary option to podman-create? Like after toolbox create -c my-container -- --device foo:bar --other-podman-option?

Fixes: #116 (although it is possible to use nvidia devices inside toolboxes - this change improves the usability significantly when using NVIDIA CTK+CDI with toolbox)

softwarefactory-project-zuul[bot] commented 7 months ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/9b19d48ad9834499acf777678f0d96b3

:heavy_check_mark: unit-test SUCCESS in 8m 11s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 31s :heavy_check_mark: unit-test-restricted SUCCESS in 7m 10s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 28m 22s :heavy_check_mark: system-test-fedora-39 SUCCESS in 27m 39s :heavy_check_mark: system-test-fedora-38 SUCCESS in 27m 28s :heavy_check_mark: system-test-fedora-37 SUCCESS in 26m 59s

Jmennius commented 7 months ago

There was a related issue with nvidia-ctk: NVIDIA/nvidia-container-toolkit#143 and the fix was merged. Until it is released there is a workaround - the chmod hook can be just removed from nvidia CDI spec file.

UPDATE: The fix was released in v1.15.0, everything work out of the box now! P.S. nvidia has changed their repos recently, so if you had nvidia-container-toolkit* installed and it is not updating past v1.13.x - remove existing and add a new repo file.

Jmennius commented 6 months ago

@debarshiray can I attract your attention, get some opinions on this? 😁

Jmennius commented 3 months ago

Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this.

I see that the Container Device Interface requires installing the NVIDIA Container Toolkit.

However, as far as I can make out, the nvidia-container-toolkit or nvidia-container-toolkit-base packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.

Yeah, you are right - it's only available in Nvidia repos for Fedora. It would be nice if it was repackaged somehow on rpmfusion... I saw that some distroa package it.

Is there anything else other than NVIDIA that uses the Container Device Interface?

I am not aware of other CDI implementations :(

I would like to understand the situation a bit better. Ultimately I want to make it as smooth as possible for the user to enable the NVIDIA proprietary driver. That becomes a problem if one needs to enable multiple different unofficial repositories, at least on Fedora.

I guess this is a way to go for the best experience. Another option with Nvidia is to basically reinstall Nvidia driver libraries inside the container with each upgrade of the kernel driver on the host.

AlvaroFS commented 2 months ago

Hi! This is great news!! Is it expected to be merged soon or should I grab the patch?

Jmennius commented 2 months ago

Hi! This is great news!! Is it expected to be merged soon or should I grab the patch?

I'd go for a patch for sure 😉 I've rebased the change.

P.S. You can do this in your ~/.config/containers/toolbox.conf

[general]
devices = ["nvidia.com/gpu=all"]
Jmennius commented 2 months ago

For Silverblue, something that I've come up with to handle regenerating the CDI spec after an upgrade: /etc/systemd/system/nvidia-cdi-update.service:

[Unit]
Description=Update Nvidia CDI configuration
DefaultDependencies=no
Before=systemd-update-done.service
ConditionNeedsUpdate=/etc

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-ctk cdi generate --output /etc/cdi/nvidia.yaml

[Install]
WantedBy=multi-user.target

sudo systemctl enable --now nvidia-cdi-update.service This will run on the next boot after an update (so not every boot) and write up to date CDI spec. Not sure if those systemd features work with regular Fedora Workstation though (too lazy to research 😄).

softwarefactory-project-zuul[bot] commented 2 months ago

Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/7814973e94aa4d73b0d68ab88a881112

:heavy_check_mark: unit-test SUCCESS in 6m 37s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 29s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 52s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 35m 24s :heavy_check_mark: system-test-fedora-40 SUCCESS in 34m 03s :heavy_check_mark: system-test-fedora-39 SUCCESS in 34m 26s :heavy_check_mark: system-test-fedora-38 SUCCESS in 34m 07s

debarshiray commented 1 month ago

For Silverblue, something that I've come up with to handle regenerating the CDI spec after an upgrade: /etc/systemd/system/nvidia-cdi-update.service:

That's a really neat hack, indeed. :)

debarshiray commented 1 month ago

Is there anything else other than NVIDIA that uses the Container Device Interface?

I am not aware of other CDI implementations :(

I see commits from Intel in github.com/cncf-tags/container-device-interface, which is great.

debarshiray commented 1 month ago

Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this. I see that the Container Device Interface requires installing the NVIDIA Container Toolkit. However, as far as I can make out, the nvidia-container-toolkit or nvidia-container-toolkit-base packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.

Yeah, you are right - it's only available in Nvidia repos for Fedora. It would be nice if it was repackaged somehow on rpmfusion... I saw that some distroa package it.

The NVIDIA Container Toolkit code seems to be entirely free software. I wonder if we can get it into Fedora proper, instead of RPMFusion.

debarshiray commented 1 month ago

Closing in favour of https://github.com/containers/toolbox/pull/1497

Thanks for again for pointing me in the right direction, @Jmennius