Open nomah98 opened 1 day ago
I don't have an nvidia device so I could try to reproduce, but using same version if I add a runtime: nvidia
attribute to my compose file I get: Error response from daemon: unknown or invalid runtime name: nvidia
which seems to demonstrate container is well configured to run with nvidia runtime.
Can you please capture docker inspect MY_CONTAINER
for both compose versions running your application, so we can compare the container configuration and differences with the newer compose version ?
I think this may be related to a change contributed by NVIDIA;
I wonder though if there's a difference here between how cli options and compose options are handled, or if the same issue happens on the CLI ("explicitly set to 0" vs "not set")
Description
Going from docker-compose-plugin/2.29.1 to docker-compose-plugin/jammy 2.29.7, the
runtime
field of the docker compose file does not enable the specified nvida runtime in my dockerfile. However, running the same image with the argument--runtime nvidia
actually enables the Nvidia runtime in the container. I have other Nvidia devices running docker-compose-plugin/2.29.1 that do not have this issue.Steps To Reproduce
On a Jetson Orin-NX with docker-compose-plugin/jammy 2.29.7, use docker compose to start a container via docker-compose that has fields such as
then try to import something that uses an nvidia shared object
from .tensorrt import *
and see an error likeImportError: /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so: file too short
Run the same image with
docker run --runtime nvidia -it MY_IMAGE bash
then try to import something that uses an nvidia shared objectfrom .tensorrt import *
No error.
Compose Version
docker-compose-plugin/jammy 2.29.7
Docker Environment
Client: Docker Engine - Community Version: 27.3.1 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.17.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.29.7 Path: /usr/libexec/docker/cli-plugins/docker-compose
Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 1 Server Version: 27.3.1 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c runc version: v1.1.14-0-g2c9f560 init version: de40ad0 Security Options: seccomp Profile: builtin cgroupns Kernel Version: 5.15.136-tegra Operating System: Ubuntu 22.04.4 LTS OSType: linux Architecture: aarch64 CPUs: 8 Total Memory: 15.29GiB Name: rudi-nx ID: 3c5b7ecd-713f-4d6e-ac9a-f6cfe3c2112f Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 192.168.11.200:5000 127.0.0.0/8 Live Restore Enabled: false
WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled
Anything else?
Francisco encountered the same issue here
Like Francisco, I was able to make this work by downgrading docker-ce and docker-compose-plugin