NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.17k stars 2.03k forks source link

nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1 #1760

Closed jacksonsshen closed 1 year ago

jacksonsshen commented 1 year ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

In old system, I can use the Docker and NVIDIA Docker images normally. When I switched to the new system, the old Docker images worked properly, but the old NVIDIA Docker image did not work properly. (need to download new NVIDIA Docker image again to work)

2. Steps to reproduce the issue

for example: docker run -it --gpus all nvidia/cuda:10.1-base docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1: unknown. ERRO[0000] error waiting for container:

3. Information to attach (optional if deemed irrelevant)

Device Index: 0 Device Minor: 0 Model: NVIDIA GeForce RTX 2080 Ti Brand: GeForce GPU UUID: GPU-24dda873-8a7f-d02f-a9ad-52f56259f944 Bus Location: 00000000:17:00.0 Architecture: 7.5

Device Index: 1 Device Minor: 1 Model: NVIDIA GeForce RTX 2080 Ti Brand: GeForce GPU UUID: GPU-a4211eb1-75b7-df20-7a29-4bd4fd504774 Bus Location: 00000000:73:00.0 Architecture: 7.5 I0617 07:33:47.726793 22846 nvc.c:434] shutting down library context I0617 07:33:47.726874 22849 rpc.c:95] terminating nvcgo rpc service I0617 07:33:47.727439 22846 rpc.c:135] nvcgo rpc service terminated successfully I0617 07:33:47.729964 22848 rpc.c:95] terminating driver rpc service I0617 07:33:47.730098 22846 rpc.c:135] driver rpc service terminated successfully

==============NVSMI LOG==============

Timestamp : Sat Jun 17 15:35:58 2023 Driver Version : 510.73.05 CUDA Version : 11.6

Attached GPUs : 2 GPU 00000000:17:00.0 Product Name : NVIDIA GeForce RTX 2080 Ti Product Brand : GeForce Product Architecture : Turing Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-24dda873-8a7f-d02f-a9ad-52f56259f944 Minor Number : 0 VBIOS Version : 90.02.17.40.9A MultiGPU Board : No Board ID : 0x1700 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.02.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x17 Device : 0x00 Domain : 0x0000 Device Id : 0x1E0410DE Bus Id : 00000000:17:00.0 Sub System Id : 0x1E0410DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 35 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 11264 MiB Reserved : 244 MiB Used : 5 MiB Free : 11013 MiB BAR1 Memory Usage Total : 256 MiB Used : 4 MiB Free : 252 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 28 C GPU Shutdown Temp : 94 C GPU Slowdown Temp : 91 C GPU Max Operating Temp : 89 C GPU Target Temperature : 84 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 12.61 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 310.00 W Clocks Graphics : 300 MHz SM : 300 MHz Memory : 405 MHz Video : 540 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 7000 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : N/A Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 2544 Type : G Name : /usr/lib/xorg/Xorg Used GPU Memory : 4 MiB

GPU 00000000:73:00.0 Product Name : NVIDIA GeForce RTX 2080 Ti Product Brand : GeForce Product Architecture : Turing Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-a4211eb1-75b7-df20-7a29-4bd4fd504774 Minor Number : 1 VBIOS Version : 90.02.30.40.90 MultiGPU Board : No Board ID : 0x7300 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.02.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x73 Device : 0x00 Domain : 0x0000 Device Id : 0x1E0710DE Bus Id : 00000000:73:00.0 Sub System Id : 0x37181028 GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 18 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 11264 MiB Reserved : 246 MiB Used : 15 MiB Free : 11001 MiB BAR1 Memory Usage Total : 256 MiB Used : 4 MiB Free : 252 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 27 C GPU Shutdown Temp : 94 C GPU Slowdown Temp : 91 C GPU Max Operating Temp : 89 C GPU Target Temperature : 84 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 15.36 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 280.00 W Clocks Graphics : 300 MHz SM : 300 MHz Memory : 405 MHz Video : 540 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 7000 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : N/A Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 2544 Type : G Name : /usr/lib/xorg/Xorg Used GPU Memory : 9 MiB GPU instance ID : N/A Compute instance ID : N/A Process ID : 4182 Type : G Name : /usr/bin/gnome-shell Used GPU Memory : 4 MiB

Server: Docker Engine - Community Engine: Version: 24.0.2 API version: 1.43 (minimum version 1.12) Go version: go1.20.4 Git commit: 659604f Built: Thu May 25 21:52:22 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0

jacksonsshen commented 1 year ago

image

shen@shen:~$ docker run -it --gpus all nvidia/cuda:10.1-base docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy': unknown. ERRO[0000] error waiting for container:

shen@shen:~$ docker run -it nvidia/cuda:10.1-base root@fd4fbecdbd68:/# ll total 88 drwxr-xr-x 1 root root 4096 Jun 25 02:02 ./ drwxr-xr-x 1 root root 4096 Jun 25 02:02 ../ -rwxr-xr-x 1 root root 0 Jun 25 02:02 .dockerenv* -rw-r--r-- 1 1000 1000 16047 Jul 2 2021 NGC-DL-CONTAINER-LICENSE drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 bin/ drwxr-xr-x 2 1000 1000 4096 Apr 24 2018 boot/ drwxr-xr-x 5 root root 360 Jun 25 02:02 dev/ drwxr-xr-x 1 1000 1000 4096 Jun 25 02:02 etc/ drwxr-xr-x 2 1000 1000 4096 Apr 24 2018 home/ drwxr-xr-x 1 1000 1000 4096 May 23 2017 lib/ drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 lib64/ drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 media/ drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 mnt/ drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 opt/ dr-xr-xr-x 867 root root 0 Jun 25 02:02 proc/ drwx------ 2 1000 1000 4096 Jun 15 2021 root/ drwxr-xr-x 5 1000 1000 4096 Jun 15 2021 run/ drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 sbin/ drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 srv/ dr-xr-xr-x 13 root root 0 Jun 25 02:02 sys/ drwxrwxrwt 2 1000 1000 4096 Jun 15 2021 tmp/ drwxr-xr-x 1 1000 1000 4096 Jun 15 2021 usr/ drwxr-xr-x 1 1000 1000 4096 Jun 15 2021 var/ root@fd4fbecdbd68:/# exit exit

shen@shen:~$