docker / docs

Source repo for Docker's Documentation
https://docs.docker.com
Apache License 2.0
4.17k stars 7.29k forks source link

docker run -it --rm --gpus all ubuntu nvidia-smi does not compute #19366

Closed SpangeJ closed 9 months ago

SpangeJ commented 9 months ago

Is this a docs issue?

Type of issue

Information is incorrect

Description

I am trying to follow the guide on a generative AI, but it results in an error. So I went to the source on how to access an NVIDIA GPU (which I have), after successfully running apt-get install nvidia-container-runtime I try to run docker run -it --rm --gpus all ubuntu nvidia-smi, but I get the following error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

Additional info about my system: nvidia-container-cli -k -d /dev/tty info

I0209 09:10:03.373686 15399 nvc.c:376] initializing library context (version=1.14.5, build=870d7c5d957f5780b8afa57c4d5cc924d4d9ed26)
I0209 09:10:03.373842 15399 nvc.c:350] using root /
I0209 09:10:03.373903 15399 nvc.c:351] using ldcache /etc/ld.so.cache
I0209 09:10:03.373932 15399 nvc.c:352] using unprivileged user 1000:1000
I0209 09:10:03.374203 15399 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0209 09:10:03.375158 15399 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0209 09:10:03.376207 15399 nvc.c:258] failed to detect NVIDIA devices
W0209 09:10:03.377318 15400 nvc.c:273] failed to set inheritable capabilities
W0209 09:10:03.377420 15400 nvc.c:274] skipping kernel modules load due to failure
I0209 09:10:03.379049 15401 rpc.c:71] starting driver rpc service
I0209 09:10:03.407791 15402 rpc.c:71] starting nvcgo rpc service
I0209 09:10:03.416054 15399 nvc_info.c:798] requesting driver information with ''
I0209 09:10:03.421178 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.525.147.05
I0209 09:10:03.421776 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.525.147.05
I0209 09:10:03.422745 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.525.147.05
I0209 09:10:03.423572 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.147.05
I0209 09:10:03.424603 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.525.147.05
I0209 09:10:03.425386 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.147.05
I0209 09:10:03.426268 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.525.147.05
I0209 09:10:03.427394 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.525.147.05
I0209 09:10:03.427609 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.147.05
I0209 09:10:03.427721 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.525.147.05
I0209 09:10:03.427786 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.525.147.05
I0209 09:10:03.427847 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.525.147.05
I0209 09:10:03.428110 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.525.147.05
I0209 09:10:03.428468 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.525.147.05
I0209 09:10:03.428799 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.525.147.05
I0209 09:10:03.429504 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.147.05
I0209 09:10:03.429925 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.147.05
I0209 09:10:03.430967 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.147.05
I0209 09:10:03.431998 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.525.147.05
I0209 09:10:03.434504 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.147.05
I0209 09:10:03.434898 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.525.147.05
I0209 09:10:03.435548 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.525.147.05
I0209 09:10:03.436131 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.525.147.05
I0209 09:10:03.436666 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.525.147.05
I0209 09:10:03.436799 15399 nvc_info.c:176] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.525.147.05
I0209 09:10:03.437195 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.525.147.05
I0209 09:10:03.437766 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.525.147.05
I0209 09:10:03.438255 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.525.147.05
I0209 09:10:03.438765 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.525.147.05
I0209 09:10:03.439472 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-nvvm.so.525.147.05
I0209 09:10:03.440008 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.525.147.05
I0209 09:10:03.440459 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.525.147.05
I0209 09:10:03.440860 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.525.147.05
I0209 09:10:03.441261 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.525.147.05
I0209 09:10:03.441574 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.525.147.05
I0209 09:10:03.441903 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.525.147.05
I0209 09:10:03.442242 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.525.147.05
I0209 09:10:03.442673 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.525.147.05
I0209 09:10:03.442904 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.525.147.05
I0209 09:10:03.443235 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libcuda.so.525.147.05
I0209 09:10:03.443554 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.525.147.05
I0209 09:10:03.443929 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.525.147.05
I0209 09:10:03.444156 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.525.147.05
I0209 09:10:03.444393 15399 nvc_info.c:176] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.525.147.05
W0209 09:10:03.444435 15399 nvc_info.c:402] missing library libnvidia-nscq.so
W0209 09:10:03.444444 15399 nvc_info.c:402] missing library libnvidia-gpucomp.so
W0209 09:10:03.444453 15399 nvc_info.c:402] missing library libnvidia-fatbinaryloader.so
W0209 09:10:03.444467 15399 nvc_info.c:402] missing library libnvidia-pkcs11.so
W0209 09:10:03.444483 15399 nvc_info.c:402] missing library libnvidia-pkcs11-openssl3.so
W0209 09:10:03.444492 15399 nvc_info.c:402] missing library libvdpau_nvidia.so
W0209 09:10:03.444505 15399 nvc_info.c:402] missing library libnvidia-ifr.so
W0209 09:10:03.444513 15399 nvc_info.c:402] missing library libnvidia-cbl.so
W0209 09:10:03.444520 15399 nvc_info.c:406] missing compat32 library libnvidia-cfg.so
W0209 09:10:03.444531 15399 nvc_info.c:406] missing compat32 library libnvidia-nscq.so
W0209 09:10:03.444545 15399 nvc_info.c:406] missing compat32 library libcudadebugger.so
W0209 09:10:03.444553 15399 nvc_info.c:406] missing compat32 library libnvidia-gpucomp.so
W0209 09:10:03.444562 15399 nvc_info.c:406] missing compat32 library libnvidia-fatbinaryloader.so
W0209 09:10:03.444572 15399 nvc_info.c:406] missing compat32 library libnvidia-allocator.so
W0209 09:10:03.444581 15399 nvc_info.c:406] missing compat32 library libnvidia-pkcs11.so
W0209 09:10:03.444590 15399 nvc_info.c:406] missing compat32 library libnvidia-pkcs11-openssl3.so
W0209 09:10:03.444603 15399 nvc_info.c:406] missing compat32 library libnvidia-ngx.so
W0209 09:10:03.444612 15399 nvc_info.c:406] missing compat32 library libvdpau_nvidia.so
W0209 09:10:03.444622 15399 nvc_info.c:406] missing compat32 library libnvidia-ifr.so
W0209 09:10:03.444640 15399 nvc_info.c:406] missing compat32 library libnvidia-rtcore.so
W0209 09:10:03.444648 15399 nvc_info.c:406] missing compat32 library libnvoptix.so
W0209 09:10:03.444660 15399 nvc_info.c:406] missing compat32 library libnvidia-cbl.so
I0209 09:10:03.445212 15399 nvc_info.c:302] selecting /usr/bin/nvidia-smi
I0209 09:10:03.445240 15399 nvc_info.c:302] selecting /usr/bin/nvidia-debugdump
I0209 09:10:03.445267 15399 nvc_info.c:302] selecting /usr/bin/nvidia-persistenced
I0209 09:10:03.445320 15399 nvc_info.c:302] selecting /usr/bin/nvidia-cuda-mps-control
I0209 09:10:03.445346 15399 nvc_info.c:302] selecting /usr/bin/nvidia-cuda-mps-server
W0209 09:10:03.445467 15399 nvc_info.c:428] missing binary nv-fabricmanager
I0209 09:10:03.445791 15399 nvc_info.c:488] listing firmware path /lib/firmware/nvidia/525.147.05/gsp_ad10x.bin
I0209 09:10:03.445804 15399 nvc_info.c:488] listing firmware path /lib/firmware/nvidia/525.147.05/gsp_tu10x.bin
I0209 09:10:03.445849 15399 nvc_info.c:561] listing device /dev/nvidiactl
I0209 09:10:03.445858 15399 nvc_info.c:561] listing device /dev/nvidia-uvm
I0209 09:10:03.445868 15399 nvc_info.c:561] listing device /dev/nvidia-uvm-tools
I0209 09:10:03.445876 15399 nvc_info.c:561] listing device /dev/nvidia-modeset
I0209 09:10:03.445915 15399 nvc_info.c:346] listing ipc path /run/nvidia-persistenced/socket
W0209 09:10:03.445947 15399 nvc_info.c:352] missing ipc path /var/run/nvidia-fabricmanager/socket
W0209 09:10:03.445977 15399 nvc_info.c:352] missing ipc path /tmp/nvidia-mps
I0209 09:10:03.445986 15399 nvc_info.c:854] requesting device information with ''
I0209 09:10:03.452671 15399 nvc_info.c:745] listing device /dev/nvidia0 (GPU-a1446a5c-e0ae-c1e2-f63a-f0df46ae9096 at 00000000:01:00.0)
NVRM version:   525.147.05
CUDA version:   12.0

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 4070 Laptop GPU
Brand:          GeForce
GPU UUID:       GPU-a1446a5c-e0ae-c1e2-f63a-f0df46ae9096
Bus Location:   00000000:01:00.0
Architecture:   8.9
I0209 09:10:03.452795 15399 nvc.c:434] shutting down library context
I0209 09:10:03.452901 15402 rpc.c:95] terminating nvcgo rpc service
I0209 09:10:03.454122 15399 rpc.c:135] nvcgo rpc service terminated successfully
I0209 09:10:03.457252 15401 rpc.c:95] terminating driver rpc service
I0209 09:10:03.457569 15399 rpc.c:135] driver rpc service terminated successfully

uname -a Linux kodeworks 6.5.0-17-generic #17~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jan 16 14:32:32 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi -a

==============NVSMI LOG==============

Timestamp                                 : Fri Feb  9 10:12:33 2024
Driver Version                            : 525.147.05
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : NVIDIA GeForce RTX 4070 Laptop GPU
    Product Brand                         : GeForce
    Product Architecture                  : Ada Lovelace
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-a1446a5c-e0ae-c1e2-f63a-f0df46ae9096
    Minor Number                          : 0
    VBIOS Version                         : 95.06.15.40.2E
    MultiGPU Board                        : No
    Board ID                              : 0x100
    Board Part Number                     : N/A
    GPU Part Number                       : 2820-775-A1
    Module ID                             : 1
    Inforom Version
        Image Version                     : G002.0000.00.03
        OEM Object                        : 2.0
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x01
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x282010DE
        Bus Id                            : 00000000:01:00.0
        Sub System Id                     : 0x13C71462
        GPU Link Info
            PCIe Generation
                Max                       : 4
                Current                   : 1
                Device Current            : 1
                Device Max                : 4
                Host Max                  : 5
            Link Width
                Max                       : 8x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 119000 KB/s
        Rx Throughput                     : 894000 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 8188 MiB
        Reserved                          : 247 MiB
        Used                              : 78 MiB
        Free                              : 7861 MiB
    BAR1 Memory Usage
        Total                             : 8192 MiB
        Used                              : 3 MiB
        Free                              : 8189 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 34 %
        Memory                            : 26 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 64 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 46 C
        GPU T.Limit Temp                  : 41 C
        GPU Shutdown T.Limit Temp         : -5 C
        GPU Slowdown T.Limit Temp         : -2 C
        GPU Max Operating T.Limit Temp    : 0 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating T.Limit Temp : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 3.89 W
        Power Limit                       : 80.00 W
        Default Power Limit               : 80.00 W
        Enforced Power Limit              : 80.00 W
        Min Power Limit                   : 5.00 W
        Max Power Limit                   : 105.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 765 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 3105 MHz
        SM                                : 3105 MHz
        Memory                            : 8001 MHz
        Video                             : 2415 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 625.000 mV
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2618
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 77 MiB

docker version

Client: Docker Engine - Community
 Cloud integration: v1.0.35+desktop.10
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.20.10
 Git commit:        afdd53b
 Built:             Thu Oct 26 09:07:41 2023
 OS/Arch:           linux/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.27.1 (136059)
 Engine:
  Version:          25.0.2
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       fce6e0c
  Built:            Thu Feb  1 00:23:17 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

dpkg -l '*nvidia*'

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                      Version                       Architecture Description
+++-=========================================-=============================-============-==========================================================
un  libgldispatch0-nvidia                     <none>                        <none>       (no description available)
ii  libnvidia-cfg1-525:amd64                  525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                        <none>                        <none>       (no description available)
un  libnvidia-common                          <none>                        <none>       (no description available)
ii  libnvidia-common-525                      525.147.05-0ubuntu0.22.04.1   all          Shared files used by the NVIDIA libraries
un  libnvidia-compute                         <none>                        <none>       (no description available)
ii  libnvidia-compute-525:amd64               525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA libcompute package
ii  libnvidia-compute-525:i386                525.147.05-0ubuntu0.22.04.1   i386         NVIDIA libcompute package
rc  libnvidia-compute-535:amd64               535.113.01-0ubuntu0.22.04.3   amd64        NVIDIA libcompute package
rc  libnvidia-compute-545:amd64               545.23.06-0ubuntu0~gpu22.04.3 amd64        NVIDIA libcompute package
ii  libnvidia-container-tools                 1.14.5-1                      amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                1.14.5-1                      amd64        NVIDIA container runtime library
un  libnvidia-decode                          <none>                        <none>       (no description available)
ii  libnvidia-decode-525:amd64                525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-525:i386                 525.147.05-0ubuntu0.22.04.1   i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-egl-wayland1:amd64              1:1.1.9-1.1                   amd64        Wayland EGL External Platform library -- shared library
un  libnvidia-encode                          <none>                        <none>       (no description available)
ii  libnvidia-encode-525:amd64                525.147.05-0ubuntu0.22.04.1   amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-525:i386                 525.147.05-0ubuntu0.22.04.1   i386         NVENC Video Encoding runtime library
un  libnvidia-extra                           <none>                        <none>       (no description available)
ii  libnvidia-extra-525:amd64                 525.147.05-0ubuntu0.22.04.1   amd64        Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                            <none>                        <none>       (no description available)
ii  libnvidia-fbc1-525:amd64                  525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-525:i386                   525.147.05-0ubuntu0.22.04.1   i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                              <none>                        <none>       (no description available)
un  libnvidia-gl-390                          <none>                        <none>       (no description available)
un  libnvidia-gl-410                          <none>                        <none>       (no description available)
ii  libnvidia-gl-525:amd64                    525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-525:i386                     525.147.05-0ubuntu0.22.04.1   i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-legacy-390xx-egl-wayland1       <none>                        <none>       (no description available)
un  libnvidia-ml.so.1                         <none>                        <none>       (no description available)
rc  linux-modules-nvidia-525-6.2.0-35-generic 6.2.0-35.35~22.04.1           amd64        Linux kernel nvidia modules for version 6.2.0-35
rc  linux-modules-nvidia-535-6.2.0-35-generic 6.2.0-35.35~22.04.1           amd64        Linux kernel nvidia modules for version 6.2.0-35
rc  linux-objects-nvidia-525-6.2.0-35-generic 6.2.0-35.35~22.04.1           amd64        Linux kernel nvidia modules for version 6.2.0-35 (objects)
rc  linux-objects-nvidia-535-6.2.0-35-generic 6.2.0-35.35~22.04.1           amd64        Linux kernel nvidia modules for version 6.2.0-35 (objects)
un  linux-signatures-nvidia-6.2.0-35-generic  <none>                        <none>       (no description available)
un  nvidia-384                                <none>                        <none>       (no description available)
un  nvidia-390                                <none>                        <none>       (no description available)
un  nvidia-common                             <none>                        <none>       (no description available)
un  nvidia-compute-utils                      <none>                        <none>       (no description available)
ii  nvidia-compute-utils-525                  525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA compute utilities
rc  nvidia-compute-utils-535                  535.113.01-0ubuntu0.22.04.3   amd64        NVIDIA compute utilities
rc  nvidia-compute-utils-545                  545.23.06-0ubuntu0~gpu22.04.3 amd64        NVIDIA compute utilities
ii  nvidia-container-runtime                  3.14.0-1                      all          NVIDIA Container Toolkit meta-package
un  nvidia-container-runtime-hook             <none>                        <none>       (no description available)
ii  nvidia-container-toolkit                  1.14.5-1                      amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base             1.14.5-1                      amd64        NVIDIA Container Toolkit Base
ii  nvidia-dkms-525                           525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA DKMS package
rc  nvidia-dkms-535                           535.113.01-0ubuntu0.22.04.3   amd64        NVIDIA DKMS package
rc  nvidia-dkms-545                           545.23.06-0ubuntu0~gpu22.04.3 amd64        NVIDIA DKMS package
un  nvidia-dkms-kernel                        <none>                        <none>       (no description available)
un  nvidia-docker                             <none>                        <none>       (no description available)
ii  nvidia-docker2                            2.14.0-1                      all          NVIDIA Container Toolkit meta-package
ii  nvidia-driver-525                         525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                      <none>                        <none>       (no description available)
un  nvidia-driver-libs                        <none>                        <none>       (no description available)
un  nvidia-egl-wayland-common                 <none>                        <none>       (no description available)
un  nvidia-firmware-535-535.113.01            <none>                        <none>       (no description available)
un  nvidia-firmware-545-545.23.06             <none>                        <none>       (no description available)
un  nvidia-kernel-common                      <none>                        <none>       (no description available)
ii  nvidia-kernel-common-525                  525.147.05-0ubuntu0.22.04.1   amd64        Shared files used with the kernel module
rc  nvidia-kernel-common-535                  535.113.01-0ubuntu0.22.04.3   amd64        Shared files used with the kernel module
rc  nvidia-kernel-common-545                  545.23.06-0ubuntu0~gpu22.04.3 amd64        Shared files used with the kernel module
un  nvidia-kernel-source                      <none>                        <none>       (no description available)
ii  nvidia-kernel-source-525                  525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA kernel source package
un  nvidia-kernel-source-535                  <none>                        <none>       (no description available)
un  nvidia-kernel-source-545                  <none>                        <none>       (no description available)
un  nvidia-libopencl1-dev                     <none>                        <none>       (no description available)
un  nvidia-opencl-icd                         <none>                        <none>       (no description available)
un  nvidia-persistenced                       <none>                        <none>       (no description available)
ii  nvidia-prime                              0.8.17.1                      all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                           510.47.03-0ubuntu1            amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                    <none>                        <none>       (no description available)
un  nvidia-smi                                <none>                        <none>       (no description available)
un  nvidia-utils                              <none>                        <none>       (no description available)
ii  nvidia-utils-525                          525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA driver support binaries
un  nvidia-vulkan-icd                         <none>                        <none>       (no description available)
ii  xserver-xorg-video-nvidia-525             525.147.05-0ubuntu0.22.04.1   amd64        NVIDIA binary Xorg driver

nvidia-container-cli -V

cli-version: 1.14.5
lib-version: 1.14.5
build date: 2024-02-07T11:55+00:00
build revision: 870d7c5d957f5780b8afa57c4d5cc924d4d9ed26
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Location

https://docs.docker.com/config/containers/resource_constraints/#gpu

Suggestion

I hope that the Docker Docs can be updated such that the procedures works on my machine, which I think is a quite common setup (Ubuntu + NVIDIA GPU)

dvdksn commented 9 months ago

Hi @SpangeJ and thanks for your issue. I'm gonna cc in @p1-0tr who might be able to help.

Just a silly question from me, did you set up the nvidia container toolkit repository before installing? Otherwise I guess you might end up installing the deprecated container runtime. See here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt

(Either way, we'll need to update these docs)

p1-0tr commented 9 months ago

Hi, I'm pretty sure it will be down to the -runtime (deprecated) vs -toolkit issue. I'll try to set aside some time later today to test the examples on my system. Buy, yeah, we definitely will need to update the docs.

SpangeJ commented 9 months ago

Hi @SpangeJ and thanks for your issue. I'm gonna cc in @p1-0tr who might be able to help.

Just a silly question from me, did you set up the nvidia container toolkit repository before installing? Otherwise I guess you might end up installing the deprecated container runtime. See here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt

(Either way, we'll need to update these docs)

@dvdksn I can at least run nvidia-container-toolkit

NVIDIA Container Runtime Hook version 1.14.5
commit: 9ea336070134e612145d342e495f2fc616aab063

But I did not fully understand the Configuration part. Because

sudo systemctl restart docker Failed to restart docker.service: Unit docker.service not found. I use systemctl --user restart docker-desktop.service to restart docker.

Same thing with Rootless systemctl --user restart docker Failed to restart docker.service: Unit docker.service not found.

Thanks for a swift reply.

p1-0tr commented 9 months ago

Oh, @SpangeJ, sorry I missed the fact that you are using Docker Desktop, when looking at your report at first. Unfortunately in Docker Desktop GPUs are currently only supported on Windows with the WSL2 backend.

You can use GPUs with a native Docker Engine installation (docker-ce) on Linux, though. To get the Docker Engine installed, please follow https://docs.docker.com/engine/install/ubuntu/ and then the configure step (I'm assuming you've still got the Nvidia toolkit installed) from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt .

The Docker Engine and Docker Desktop can coexist, if you want to keep using DD for other usecases. To switch between them you can use docker context use <context name>, DD creates a desktop-linux context and the native engine should be accessible under the default context (docker context ls will list available contexts).

Sorry for the initial confusion. And thanks for your questions :)

SpangeJ commented 9 months ago

@p1-0tr thank you for your reply, I do indeed use Docker Desktop. E.G. I start it by running systemctl --user start docker-desktop what confuses me is that you say that Docker Engine and Docker Desktop is not compatible and in the docs it says:

You can install Docker Engine in different ways, depending on your needs:
Docker Engine comes bundled with Docker Desktop for Linux This is the easiest and quickest way to get started.

Regardless, I did an Uninstall old versions, then downloaded the newest DEB and installed that. I did docker context use default to change from Desktop to Engine, and now I get a docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?. when I do docker run -it --rm --gpus all ubuntu nvidia-smi

I tried to follow the NVIDIA Configurations, although I do not know if I should choose Configuring Docker or Rootless mode (I'm thinking maybe the latter since I already use the --user option with Docker Desktop). Regardless of the options when I run sudo systemctl restart docker or systemctl --user restart docker I get Failed to restart docker.service: Unit docker.service not found.

Any thoughts? Why is not docker.service existing?

p1-0tr commented 9 months ago

@p1-0tr thank you for your reply, I do indeed use Docker Desktop. E.G. I start it by running systemctl --user start docker-desktop what confuses me is that you say that Docker Engine and Docker Desktop is not compatible and in the docs it says:

Yeah I can see how this can be confusing. In general Docker Desktop should be the simpler option to use. However, given that it uses a virtual machine to run the Docker Engine it manages, it has some constraints, one of them is not having access to devices (GPU's among them).

... then downloaded the newest DEB and installed that. ...

Do you mean you manually downloaded the DEB file and installed that?

... Regardless of the options when I run sudo systemctl restart docker or systemctl --user restart docker I get Failed to restart docker.service: Unit docker.service not found.

This sounds like setting up the systemd integration failed (or some other error was encountered during the installation). Could you follow the instructions for installing with apt?

... although I do not know if I should choose Configuring Docker or Rootless mode (I'm thinking maybe the latter since I already use the --user option with Docker Desktop). ...

If you followed the default instruction (e.g. the one I linked in the previous paragraph), sudo systemctl restart docker should work. The Rootless mode variant is there for cases where one can't install docker-ce via the package manager, e.g. on IT managed systems where one does not have access to root, then you can follow https://docs.docker.com/engine/security/rootless/ (but in such cases you may need to ask your admin to configure access to the gpu device nodes).

SpangeJ commented 9 months ago

@p1-0tr Yes, I installed the DEB file, following these instructions.

So I uninstalled again and followed the apt installation instructions. And finally! I can run docker run -it --rm --gpus all ubuntu nvidia-smi getting that sweet proper output.

Running docker context use desktop-linux followed by systemctl --user start docker-desktop I am able to get my Docker Desktop GUI up and running.

Of course now, as expected, I get my old libnvidia-ml.so.1 error when I run docker run -it --rm --gpus all ubuntu nvidia-smi.

Switching back using docker context use default it works.

Looping back to why this issue was opened, GenAI application, I can now run docker compose up --build.

Thank you for helping me resolving this issue, @p1-0tr and @dvdksn

With this comment I choose to close this issue, although I recommend Docker to add additional description about for Linux user using the --gpu flag when they have installed Docker using th DEB package.

p1-0tr commented 9 months ago

@SpangeJ - I'm glad things are working for you now :)

@p1-0tr Yes, I installed the DEB file, following these instructions.

Those are for installing Docker Desktop, which unfortunately currently does not work with GPUs (or other devices). We've amended the GenAI guide to clarify which is the right choice - Engine or Desktop - depending on the OS one is using, and whether one wants to take advantage of GPU acceleration.

Thanks for bringing the issue to our attention ! :)

docker-robot[bot] commented 6 months ago

Closed issues are locked after 30 days of inactivity. This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

/lifecycle locked