Closed Jmennius closed 1 month ago
Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/9b19d48ad9834499acf777678f0d96b3
:heavy_check_mark: unit-test SUCCESS in 8m 11s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 31s :heavy_check_mark: unit-test-restricted SUCCESS in 7m 10s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 28m 22s :heavy_check_mark: system-test-fedora-39 SUCCESS in 27m 39s :heavy_check_mark: system-test-fedora-38 SUCCESS in 27m 28s :heavy_check_mark: system-test-fedora-37 SUCCESS in 26m 59s
There was a related issue with nvidia-ctk
: NVIDIA/nvidia-container-toolkit#143 and the fix was merged.
Until it is released there is a workaround - the chmod hook can be just removed from nvidia CDI spec file.
UPDATE: The fix was released in v1.15.0, everything work out of the box now!
P.S. nvidia has changed their repos recently, so if you had nvidia-container-toolkit*
installed and it is not updating past v1.13.x - remove existing and add a new repo file.
@debarshiray can I attract your attention, get some opinions on this? 😁
Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this.
I see that the Container Device Interface requires installing the NVIDIA Container Toolkit.
However, as far as I can make out, the
nvidia-container-toolkit
ornvidia-container-toolkit-base
packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.
Yeah, you are right - it's only available in Nvidia repos for Fedora. It would be nice if it was repackaged somehow on rpmfusion... I saw that some distroa package it.
Is there anything else other than NVIDIA that uses the Container Device Interface?
I am not aware of other CDI implementations :(
I would like to understand the situation a bit better. Ultimately I want to make it as smooth as possible for the user to enable the NVIDIA proprietary driver. That becomes a problem if one needs to enable multiple different unofficial repositories, at least on Fedora.
I guess this is a way to go for the best experience. Another option with Nvidia is to basically reinstall Nvidia driver libraries inside the container with each upgrade of the kernel driver on the host.
Hi! This is great news!! Is it expected to be merged soon or should I grab the patch?
Hi! This is great news!! Is it expected to be merged soon or should I grab the patch?
I'd go for a patch for sure 😉 I've rebased the change.
P.S. You can do this in your ~/.config/containers/toolbox.conf
[general]
devices = ["nvidia.com/gpu=all"]
For Silverblue, something that I've come up with to handle regenerating the CDI spec after an upgrade:
/etc/systemd/system/nvidia-cdi-update.service
:
[Unit]
Description=Update Nvidia CDI configuration
DefaultDependencies=no
Before=systemd-update-done.service
ConditionNeedsUpdate=/etc
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-ctk cdi generate --output /etc/cdi/nvidia.yaml
[Install]
WantedBy=multi-user.target
sudo systemctl enable --now nvidia-cdi-update.service
This will run on the next boot after an update (so not every boot) and write up to date CDI spec.
Not sure if those systemd features work with regular Fedora Workstation though (too lazy to research 😄).
Build succeeded. https://softwarefactory-project.io/zuul/t/local/buildset/7814973e94aa4d73b0d68ab88a881112
:heavy_check_mark: unit-test SUCCESS in 6m 37s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 29s :heavy_check_mark: unit-test-restricted SUCCESS in 5m 52s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 35m 24s :heavy_check_mark: system-test-fedora-40 SUCCESS in 34m 03s :heavy_check_mark: system-test-fedora-39 SUCCESS in 34m 26s :heavy_check_mark: system-test-fedora-38 SUCCESS in 34m 07s
For Silverblue, something that I've come up with to handle regenerating the CDI spec after an upgrade:
/etc/systemd/system/nvidia-cdi-update.service
:
That's a really neat hack, indeed. :)
Is there anything else other than NVIDIA that uses the Container Device Interface?
I am not aware of other CDI implementations :(
I see commits from Intel in github.com/cncf-tags/container-device-interface, which is great.
Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this. I see that the Container Device Interface requires installing the NVIDIA Container Toolkit. However, as far as I can make out, the
nvidia-container-toolkit
ornvidia-container-toolkit-base
packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.Yeah, you are right - it's only available in Nvidia repos for Fedora. It would be nice if it was repackaged somehow on rpmfusion... I saw that some distroa package it.
The NVIDIA Container Toolkit code seems to be entirely free software. I wonder if we can get it into Fedora proper, instead of RPMFusion.
Closing in favour of https://github.com/containers/toolbox/pull/1497
Thanks for again for pointing me in the right direction, @Jmennius
This allows to use CDI infrastructure, which often does more then just mapping devices in
/dev
- for NVIDIA this will additionally map a set of libraries into container (which are essential to use the device without a hassle).Since this is only a pass-through, maybe instead of having a
--device
specific option we should have an ability to pass arbitrary option topodman-create
? Like aftertoolbox create -c my-container -- --device foo:bar --other-podman-option
?Fixes: #116 (although it is possible to use nvidia devices inside toolboxes - this change improves the usability significantly when using NVIDIA CTK+CDI with toolbox)