canonical / docker-snap

https://snapcraft.io/docker
MIT License
52 stars 27 forks source link

Nvidia runtime default #153

Closed jocado closed 11 months ago

jocado commented 1 year ago

Fix bug in runtime config

Should be nvidia-container-runtime binary, not nvidia-ctk

NVIDIA GPU support works without using the full nvidia-container-runtime, but in some cases it turns out that switching to the nvidia-container-runtime entirely is beneficial [ ability to schedule multiple simultaneous GPU containers ]

Example usage:

docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all {image-name} {container-name}
jocado commented 1 year ago

Hi @lucaskanashiro

Would it be possible to merge this change [ which essentially fixes a bug ] so it could be bumped to stable before the changes in https://github.com/docker-snap/docker-snap/pull/152 ?

Thanks!

jocado commented 12 months ago

Hi @lucaskanashiro

Would it be possible to merge this change [ which essentially fixes a bug ] so it could be bumped to stable before the changes in #152 ?

Thanks!

Hi @lucaskanashiro - just wondering if you saw this request and had any feedback ?

Thank you.

lucaskanashiro commented 11 months ago

OK. The changes now look good to me. Let's wait for the CI results to approve this.

jocado commented 11 months ago

Great - thanks for review :+1:

Do you have any rough idea about timescales for promotion through channels ?

lucaskanashiro commented 11 months ago

It will depend on the internal testing which might take some weeks (based on the previous revision). BTW, The last PR I merged from you is already in the candidate channel, feel free to test it out.

jocado commented 11 months ago

Sure - I already tested it - thank you :smile:

I will keep an eye on the revisions over the next few weeks.

lucaskanashiro commented 11 months ago

I just merged it but now I noticed we should have squashed some commits to keep the history clean. Let's try to do it next time.

jocado commented 11 months ago

I will try and remember that for next time - thanks for your review and help :+1:

YamiYukiSenpai commented 10 months ago
$ sudo docker run -d --name jellyfin --net=host --volume /home/.jellyfin/docker/config:/config --volume /home/.jellyfin/docker/cache:/cache --mount type=bind,source=/media,destination=/media,ro=false --user 1001:1001 --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-modeset:/dev/nvidia-modeset --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --runtime=nvidia --gpus all jellyfin/jellyfin
docker: Error response from daemon: unknown or invalid runtime name: nvidia.

Still no dice for me

channels:
  latest/stable:    20.10.24 2023-05-25 (2893) 135MB -
  latest/candidate: 20.10.24 2023-09-29 (2904) 135MB -
  latest/beta:      20.10.24 2023-10-02 (2910) 135MB -
  latest/edge:      24.0.5   2023-10-07 (2915) 136MB -
  core18/stable:    20.10.17 2023-03-13 (2746) 146MB -
  core18/candidate: ↑                                
  core18/beta:      ↑                                
  core18/edge:      ↑                                
installed:          24.0.5              (2915) 136MB -

also tried removing --runtime=nvidia, and got this:

$ sudo docker run -d --name jellyfin --net=host --volume /home/.jellyfin/docker/config:/config --volume /home/.jellyfin/docker/cache:/cache --mount type=bind,source=/media,destination=/media,ro=false --user 1001:1001 --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-modeset:/dev/nvidia-modeset --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --gpus all jellyfin/jellyfin
a1931bf82e62bc391ca595b42227c314aeba9d04e713a7b87f0558d70733e208
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
jocado commented 10 months ago

Hi @YamiYukiSenpai

Couple of things: