Closed LeroyOP0 closed 9 months ago
Hi @LeroyOP0,
So did I understand you correctly that you first installed nvidia-container-runtime
, then uninstalled it and installed nvidia-container-toolkit
instead? If so what is the output of $ cat /etc/docker/daemon.json
now? Did you reboot or restart the Docker daemon after upgrading?
I personally have not installed the nvidia-container-toolkit
yet but some PhD students of ours did so recently and as far as I remember they did not need to add anything to the /etc/docker/daemon.json
. Could you try to remove the contents of it, reboot the system, give it another try and send me the output of $ docker info
?
Collecting the info and getting back to you. Thanks champ
On Mon, 26 Feb 2024 at 11:46 Tobit Flatscher @.***> wrote:
Hi @LeroyOP0 https://github.com/LeroyOP0, So did I understand you correctly that you first installed nvidia-container-runtime, then uninstalled it and installed nvidia-container-toolkit instead? If so what is the output of $ cat /etc/docker/daemon.json now? Did you reboot or restart the Docker daemon after upgrading? I personally have not installed the nvidia-container-toolkit yet but some PhD students of ours did so recently and as far as I remember they did not need to add anything to the /etc/docker/daemon.json. Could you try to remove the contents of it, reboot the system, give it another try and send me the output of $ docker info?
— Reply to this email directly, view it on GitHub https://github.com/2b-t/docker-for-robotics/issues/1#issuecomment-1963704787, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVTTZ4ADZ5YKY6NSZ4ZFRIDYVRKY5AVCNFSM6AAAAABDZRDDDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRTG4YDINZYG4 . You are receiving this because you were mentioned.Message ID: @.***>
For:
cat /etc/docker/daemon.json
I get:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
and everything works perfect.
We can see that in docker info
Client: Docker Engine - Community
Version: 25.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.24.5
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 12
Server Version: 25.0.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.15.0-97-generic
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 13.5GiB
Name: XXXXX
ID: e5c9c593-e6a4-4fe7-ae2b-d840d9af1af6
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
But considering the package is deprecated and I want to switch to the recommended nvidia-container-toolkit, what steps should I follow to configure that the nvidia runtime to be that one.
In my installed packages (apt list --installed | grep nvidia-container
) I get both to be installed:
libnvidia-container-tools/bionic,now 1.13.5-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.13.5-1 amd64 [installed,automatic]
nvidia-container-runtime/bionic,now 3.13.0-1 all [installed]
nvidia-container-toolkit-base/bionic,now 1.13.5-1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.13.5-1 amd64 [installed,automatic]
Shoud I simply uninstall the "nvidia-container-runtime"?
I would uninstall the nvidia-container-runtime
and delete the contents of /etc/docker/daemon.json
, then restart the system and check the output of $ docker info
to make sure that it still has nvidia
under Runtimes
.
If that does not work uninstall both of them and reinstall the nvidia-container-toolkit
only.
Let me know if that works...
After uninstalling with sudo apt remove nvidia-container-runtime
and rechecking with apt list --installed | grep nvidia-container
we get:
libnvidia-container-tools/bionic,now 1.13.5-1 amd64 [installed,auto-removable]
libnvidia-container1/bionic,now 1.13.5-1 amd64 [installed,auto-removable]
nvidia-container-toolkit-base/bionic,now 1.13.5-1 amd64 [installed,auto-removable]
nvidia-container-toolkit/bionic,now 1.13.5-1 amd64 [installed,auto-removable]
No nvidia-container-runtime.
Removed the content of the dameon.json as well.
Then restarting with sudo systemctl daemon-reload
and sudo systemctl restart docker
, and checking docker info | grep run
doesn't list nvidia runtime any longer - not good I guess.
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
runc version: v1.1.12-0-g51d5e94
Now i'll try to uninstall nvidia-container-toolkit as well
Reinstalling nvidia-container-toolkit, restarting docker, and checking docker info shows that there's no nvidia runtime. Only runc.
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
runc version: v1.1.12-0-g51d5e94
How does using nvidia-container-toolkit may create the required nvidia runtime?
Oops missed some steps.... I think I didn't configure docker from the installation guide
Rechecking
BINGO!
I just missed to configure docker and then it sets the nvidia runtime.
Thanks @2b-t you're awesome.
Excellent, I will add a corresponding comment inside my guide. You are welcome, @LeroyOP0!
nvidia-container-runtime works well exactly as instructed by the guide.
But it deprecated and advised to switch to [https://github.com/NVIDIA/nvidia-container-toolkit?tab=readme-ov-file]
Tried to simply change the daemon.json:
but got an error when using "docker compose ... up":