Open rany2 opened 2 months ago
For someone facing this issue, the following workaround seems like it works OK.
Define a new runtime at /etc/containers/containers.conf.d/50-nvidia-runtime.conf
:
[engine.runtimes]
nvidia = ["/usr/bin/nvidia-container-runtime"]
Use runtime: nvidia
in the compose service instead of the CDI device.
jellyfin:
image: docker.io/jellyfin/jellyfin:latest
container_name: jellyfin
restart: always
#user: 973:973 # media:media
runtime: nvidia
group_add:
- video
ports:
- 127.0.0.1:8096:8096
volumes:
- ./jellyfin/config:/config
- ./jellyfin/cache:/cache
- /mnt/hdd/media:/data/media
security_opt:
- label=disable
labels:
- io.containers.autoupdate=registry
I haven't tested the generate quadlet service but it returns the following which seems correct (ignore the volume paths, I didn't pass --absolute-host-paths
):
# jellyfin.container
[Container]
AutoUpdate=registry
ContainerName=jellyfin
Image=docker.io/jellyfin/jellyfin:latest
PodmanArgs=--group-add video
PublishPort=127.0.0.1:8096:8096
SecurityLabelDisable=true
Volume=./jellyfin/config:/config
Volume=./jellyfin/cache:/cache
Volume=/mnt/hdd/media:/data/media
GlobalArgs=--runtime nvidia
[Service]
Restart=always
According to the Compose Specification, devices
must be in the form HOST_PATH:CONTAINER_PATH[:CGROUP_PERMISSIONS]
.
Specifically for Podman, there is podman run --gpus
(added in Podman v5.0.0), so you could add PodmanArgs=--gpus all
to the generated .container
Quadlet file.
According to the Compose Specification,
devices
must be in the formHOST_PATH:CONTAINER_PATH[:CGROUP_PERMISSIONS]
.
Shouldn't the spec be corrected given that CDI devices exist? I think CDI devices are a relatively recent standard (not older than 5 years) and it's only very recently that Nvidia started recommending it for Podman users. It seems like a case of the spec being out of date.
Docker also supports CDI devices but I'm not sure if their docker-compose is doing this same type of validation.
IMO it should be valid given that both podman run
and docker run
accept it as valid.
Specifically for Podman, there is
podman run --gpus
(added in Podman v5.0.0), so you could addPodmanArgs=--gpus all
to the generated.container
Quadlet file.
I actually preferred the runtime approach as it doesn't require me to create some kind of package update hook/systemd service that keeps the CDI yaml file up-to-date. The issue with CDI is that the file needs to be updated everytime Cuda or the Nvidia driver is updated.
Either way, this issue doesn't impact me anymore but I kept the issue open as it seems a simple issue to fix. Someone might need CDI devices for some other vendor and wouldn't be able to use the runtime workaround.
(Edit: --gpus=all
just adds the Nvidia CDI devices behind the scenes. https://github.com/containers/podman/pull/21180)
Thanks for the information! I haven't tried to use a GPU in a container myself and hadn't heard of CDI before.
Shouldn't the spec be corrected given that CDI devices exist?
Probably. You should create an issue in the compose-spec repo since you understand this better than I do.
IMO it should be valid given that both
podman run
anddocker run
accept it as valid.
Is there documentation on this? I can't find anything about CDI in the docker-run(1) or podman-run(1) man pages.
Is there documentation on this? I can't find anything about CDI in the docker-run(1) or podman-run(1) man pages.
In the podman-run man page, the reference to CDI devices is subtle:
--device=host-device[:container-device][:permissions]
With CDI devices, container-device and permissions needs to be omitted. It is strange it isn't mentioned more directly though.
I made a ticket here: https://github.com/compose-spec/compose-spec/issues/532
In the podman-run man page, the reference to CDI devices is subtle:
--device=host-device[:container-device][:permissions]
With CDI devices, container-device and permissions needs to be omitted. It is strange it isn't mentioned more directly though.
Are you sure that's a reference to CDI devices? Leaving off the container-device instructs Podman to mount the device in the same place in the container as the host.
I get that Podman and Docker do support CDI devices. I'm just hesitant to add it to Podlet / compose_spec
without clear documentation to reference.
It's actually not, I checked the man page's git history and this predates CDI.
Consider the following service:
Ignore the fact that the user entry would fail with podlet due to https://github.com/containers/podlet/issues/106, another validation failure is triggered by the devices entry.