NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.51k stars 271 forks source link

WSL2: nvidia-container-cli mount error, libnvidia-ml.so.1: file exists: unknown. #289

Open Mihawk2022 opened 3 years ago

Mihawk2022 commented 3 years ago

1. Issue or feature description

I prepare environment follow this guide:

When sudo docker run --gpus all --runtime=nvidia -it --rm <my image name>, there comes the issue

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/706b1d1b6de681b6daf1cab979336a9d465d9b333962cc17db663f2e334d5776/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

Though encounter problems when run my own image, this sample just works fine: docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark image

And I also checked that no nvidia driver installed in my image: docker exec -it containerID /bin/bash apt list --installed shows there isn't any nvidia or libnvidia package, only have some cuda related packages (cuda-compat-10-2, cuda-cudart-10-2, cuda-license-10-2)

2. Information

nvidia-container information from nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I1004 13:41:19.446777 13740 nvc.c:372] initializing library context (version=1.5.1, build=4afad130c4c253abd3b2db563ffe9331594bda41) I1004 13:41:19.447100 13740 nvc.c:346] using root / I1004 13:41:19.447125 13740 nvc.c:347] using ldcache /etc/ld.so.cache I1004 13:41:19.447183 13740 nvc.c:348] using unprivileged user 1000:1000 I1004 13:41:19.447196 13740 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I1004 13:41:19.465867 13740 dxcore.c:227] Creating a new WDDM Adapter for hAdapter:40000000 luid:f95e09 I1004 13:41:19.478468 13740 dxcore.c:268] Adding new adapter via dxcore hAdapter:40000000 luid:f95e09 wddm version:3000 I1004 13:41:19.478495 13740 dxcore.c:326] dxcore layer initialized successfully W1004 13:41:19.478894 13740 nvc.c:397] skipping kernel modules load on WSL I1004 13:41:19.479135 13741 driver.c:101] starting driver service I1004 13:41:19.537408 13740 nvc_info.c:758] requesting driver information with '' I1004 13:41:19.551091 13740 nvc_info.c:197] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03 I1004 13:41:19.552122 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-opticalflow.so.1 I1004 13:41:19.552152 13740 nvc_info.c:197] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.91.03 I1004 13:41:19.553231 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-ml.so.1 I1004 13:41:19.553264 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03 I1004 13:41:19.553295 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03 I1004 13:41:19.554246 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-encode.so.1 I1004 13:41:19.554278 13740 nvc_info.c:197] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.91.03 I1004 13:41:19.555174 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvcuvid.so.1 I1004 13:41:19.555259 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libdxcore.so I1004 13:41:19.556208 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libcuda.so.1 I1004 13:41:19.556240 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03 I1004 13:41:19.556348 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03 W1004 13:41:19.556404 13740 nvc_info.c:397] missing library libnvidia-cfg.so W1004 13:41:19.556426 13740 nvc_info.c:397] missing library libnvidia-nscq.so W1004 13:41:19.556429 13740 nvc_info.c:397] missing library libnvidia-fatbinaryloader.so W1004 13:41:19.556431 13740 nvc_info.c:397] missing library libnvidia-allocator.so W1004 13:41:19.556433 13740 nvc_info.c:397] missing library libnvidia-ngx.so W1004 13:41:19.556434 13740 nvc_info.c:397] missing library libvdpau_nvidia.so W1004 13:41:19.556453 13740 nvc_info.c:397] missing library libnvidia-eglcore.so W1004 13:41:19.556456 13740 nvc_info.c:397] missing library libnvidia-glcore.so W1004 13:41:19.556457 13740 nvc_info.c:397] missing library libnvidia-tls.so W1004 13:41:19.556459 13740 nvc_info.c:397] missing library libnvidia-glsi.so W1004 13:41:19.556460 13740 nvc_info.c:397] missing library libnvidia-fbc.so W1004 13:41:19.556462 13740 nvc_info.c:397] missing library libnvidia-ifr.so W1004 13:41:19.556500 13740 nvc_info.c:397] missing library libnvidia-rtcore.so W1004 13:41:19.556506 13740 nvc_info.c:397] missing library libnvoptix.so W1004 13:41:19.556512 13740 nvc_info.c:397] missing library libGLX_nvidia.so W1004 13:41:19.556514 13740 nvc_info.c:397] missing library libEGL_nvidia.so W1004 13:41:19.556521 13740 nvc_info.c:397] missing library libGLESv2_nvidia.so W1004 13:41:19.556524 13740 nvc_info.c:397] missing library libGLESv1_CM_nvidia.so W1004 13:41:19.556526 13740 nvc_info.c:397] missing library libnvidia-glvkspirv.so W1004 13:41:19.556527 13740 nvc_info.c:397] missing library libnvidia-cbl.so W1004 13:41:19.556547 13740 nvc_info.c:401] missing compat32 library libnvidia-ml.so W1004 13:41:19.556555 13740 nvc_info.c:401] missing compat32 library libnvidia-cfg.so W1004 13:41:19.556557 13740 nvc_info.c:401] missing compat32 library libnvidia-nscq.so W1004 13:41:19.556562 13740 nvc_info.c:401] missing compat32 library libcuda.so W1004 13:41:19.556564 13740 nvc_info.c:401] missing compat32 library libnvidia-opencl.so W1004 13:41:19.556583 13740 nvc_info.c:401] missing compat32 library libnvidia-ptxjitcompiler.so W1004 13:41:19.556586 13740 nvc_info.c:401] missing compat32 library libnvidia-fatbinaryloader.so W1004 13:41:19.556587 13740 nvc_info.c:401] missing compat32 library libnvidia-allocator.so W1004 13:41:19.556589 13740 nvc_info.c:401] missing compat32 library libnvidia-compiler.so W1004 13:41:19.556625 13740 nvc_info.c:401] missing compat32 library libnvidia-ngx.so W1004 13:41:19.556629 13740 nvc_info.c:401] missing compat32 library libvdpau_nvidia.so W1004 13:41:19.556632 13740 nvc_info.c:401] missing compat32 library libnvidia-encode.so W1004 13:41:19.556638 13740 nvc_info.c:401] missing compat32 library libnvidia-opticalflow.so W1004 13:41:19.556640 13740 nvc_info.c:401] missing compat32 library libnvcuvid.so W1004 13:41:19.556644 13740 nvc_info.c:401] missing compat32 library libnvidia-eglcore.so W1004 13:41:19.556667 13740 nvc_info.c:401] missing compat32 library libnvidia-glcore.so W1004 13:41:19.556670 13740 nvc_info.c:401] missing compat32 library libnvidia-tls.so W1004 13:41:19.556676 13740 nvc_info.c:401] missing compat32 library libnvidia-glsi.so W1004 13:41:19.556677 13740 nvc_info.c:401] missing compat32 library libnvidia-fbc.so W1004 13:41:19.556679 13740 nvc_info.c:401] missing compat32 library libnvidia-ifr.so W1004 13:41:19.556680 13740 nvc_info.c:401] missing compat32 library libnvidia-rtcore.so W1004 13:41:19.556682 13740 nvc_info.c:401] missing compat32 library libnvoptix.so W1004 13:41:19.556700 13740 nvc_info.c:401] missing compat32 library libGLX_nvidia.so W1004 13:41:19.556703 13740 nvc_info.c:401] missing compat32 library libEGL_nvidia.so W1004 13:41:19.556705 13740 nvc_info.c:401] missing compat32 library libGLESv2_nvidia.so W1004 13:41:19.556740 13740 nvc_info.c:401] missing compat32 library libGLESv1_CM_nvidia.so W1004 13:41:19.556745 13740 nvc_info.c:401] missing compat32 library libnvidia-glvkspirv.so W1004 13:41:19.556746 13740 nvc_info.c:401] missing compat32 library libnvidia-cbl.so W1004 13:41:19.556748 13740 nvc_info.c:401] missing compat32 library libdxcore.so I1004 13:41:19.558106 13740 nvc_info.c:277] selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_733101c735b9e264/nvidia-smi W1004 13:41:19.884566 13740 nvc_info.c:423] missing binary nvidia-debugdump W1004 13:41:19.884603 13740 nvc_info.c:423] missing binary nvidia-persistenced W1004 13:41:19.884606 13740 nvc_info.c:423] missing binary nv-fabricmanager W1004 13:41:19.884608 13740 nvc_info.c:423] missing binary nvidia-cuda-mps-control W1004 13:41:19.884609 13740 nvc_info.c:423] missing binary nvidia-cuda-mps-server I1004 13:41:19.884611 13740 nvc_info.c:437] skipping path lookup for dxcore I1004 13:41:19.884617 13740 nvc_info.c:520] listing device /dev/dxg W1004 13:41:19.884653 13740 nvc_info.c:347] missing ipc path /var/run/nvidia-persistenced/socket W1004 13:41:19.884663 13740 nvc_info.c:347] missing ipc path /var/run/nvidia-fabricmanager/socket W1004 13:41:19.884768 13740 nvc_info.c:347] missing ipc path /tmp/nvidia-mps I1004 13:41:19.884791 13740 nvc_info.c:814] requesting device information with '' I1004 13:41:19.896593 13740 nvc_info.c:686] listing dxcore adapter 0 (GPU-4949b172-957c-5479-5dc3-12e0ea688389 at 00000000:2d:00.0) NVRM version: 510.06 CUDA version: 11.2

Device Index: 0 Device Minor: 0 Model: NVIDIA GeForce RTX 2080 Ti Brand: GeForce GPU UUID: GPU-4949b172-957c-5479-5dc3-12e0ea688389 Bus Location: 00000000:2d:00.0 Architecture: 7.5 I1004 13:41:19.896655 13740 nvc.c:423] shutting down library context I1004 13:41:19.897661 13741 driver.c:163] terminating driver service I1004 13:41:19.898674 13740 driver.c:203] driver service terminated successfully

Kernel version from uname -a

Linux DESKTOP 5.10.16.3-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Any relevant kernel output lines from dmesg

[ 0.000000] Linux version 5.10.16.3-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) NVIDIA/nvidia-docker#1 SMP Fri Apr 2 22:23:49 UTC 2021 [ 0.000000] Command line: initrd=\initrd.img panic=-1 nr_cpus=16 swiotlb=force pty.legacy_count=0 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Centaur CentaurHauls [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000e0fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000001fffff] ACPI data [ 0.000000] BIOS-e820: [mem 0x0000000000200000-0x00000000f7ffffff] usable [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000004057fffff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] Hypervisor detected: Microsoft Hyper-V [ 0.000000] Hyper-V: features 0xae7f, privilege high: 0x3b8030, hints 0xc2c, misc 0xe0bed7b2 [ 0.000000] Hyper-V Host Build:22000-10.0-0-0.194 [ 0.000000] Hyper-V: LAPIC Timer Frequency: 0x1e8480 [ 0.000000] Hyper-V: Using hypercall for remote TLB flush [ 0.000000] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns [ 0.000001] tsc: Detected 3899.997 MHz processor [ 0.000007] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000008] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000010] last_pfn = 0x405800 max_arch_pfn = 0x400000000 [ 0.000033] MTRR default type: uncachable [ 0.000033] MTRR fixed ranges enabled: [ 0.000034] 00000-3FFFF write-back [ 0.000034] 40000-7FFFF uncachable [ 0.000035] 80000-8FFFF write-back [ 0.000035] 90000-FFFFF uncachable [ 0.000035] MTRR variable ranges enabled: [ 0.000036] 0 base 000000000000 mask FFFF00000000 write-back [ 0.000037] 1 base 000100000000 mask FFF000000000 write-back [ 0.000037] 2 disabled [ 0.000037] 3 disabled [ 0.000038] 4 disabled [ 0.000038] 5 disabled [ 0.000038] 6 disabled [ 0.000038] 7 disabled [ 0.000047] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [ 0.000059] last_pfn = 0xf8000 max_arch_pfn = 0x400000000 [ 0.000071] Using GB pages for direct mapping [ 0.000322] RAMDISK: [mem 0x03035000-0x03043fff] [ 0.000326] ACPI: Early table checksum verification disabled [ 0.000332] ACPI: RSDP 0x00000000000E0000 000024 (v02 VRTUAL) [ 0.000334] ACPI: XSDT 0x0000000000100000 000044 (v01 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000338] ACPI: FACP 0x0000000000101000 000114 (v06 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000341] ACPI: DSDT 0x00000000001011B8 01E184 (v02 MSFTVM DSDT01 00000001 MSFT 05000000) [ 0.000343] ACPI: FACS 0x0000000000101114 000040 [ 0.000344] ACPI: OEM0 0x0000000000101154 000064 (v01 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000346] ACPI: SRAT 0x000000000011F33C 0003B0 (v02 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000347] ACPI: APIC 0x000000000011F6EC 0000C8 (v04 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000351] ACPI: Local APIC address 0xfee00000 [ 0.000516] Zone ranges: [ 0.000517] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.000518] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.000519] Normal [mem 0x0000000100000000-0x00000004057fffff] [ 0.000519] Device empty [ 0.000520] Movable zone start for each node [ 0.000520] Early memory node ranges [ 0.000521] node 0: [mem 0x0000000000001000-0x000000000009ffff] [ 0.000522] node 0: [mem 0x0000000000200000-0x00000000f7ffffff] [ 0.000522] node 0: [mem 0x0000000100000000-0x00000004057fffff] [ 0.000857] Zeroed struct page in unavailable ranges: 10593 pages [ 0.000859] Initmem setup node 0 [mem 0x0000000000001000-0x00000004057fffff] [ 0.000860] On node 0 totalpages: 4183711 [ 0.000861] DMA zone: 59 pages used for memmap [ 0.000862] DMA zone: 22 pages reserved [ 0.000862] DMA zone: 3743 pages, LIFO batch:0 [ 0.000884] DMA32 zone: 16320 pages used for memmap [ 0.000884] DMA32 zone: 1011712 pages, LIFO batch:63 [ 0.010695] Normal zone: 49504 pages used for memmap [ 0.010698] Normal zone: 3168256 pages, LIFO batch:63 [ 0.011050] ACPI: Local APIC address 0xfee00000 [ 0.011055] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) [ 0.011340] IOAPIC[0]: apic_id 16, version 17, address 0xfec00000, GSI 0-23 [ 0.011344] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.011345] ACPI: IRQ9 used by override. [ 0.011346] Using ACPI (MADT) for SMP configuration information [ 0.011353] smpboot: Allowing 16 CPUs, 0 hotplug CPUs [ 0.011362] [mem 0xf8000000-0xffffffff] available for PCI devices [ 0.011363] Booting paravirtualized kernel on Hyper-V [ 0.011365] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.015482] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:16 nr_node_ids:1 [ 0.016192] percpu: Embedded 52 pages/cpu s173272 r8192 d31528 u262144 [ 0.016196] pcpu-alloc: s173272 r8192 d31528 u262144 alloc=1*2097152 [ 0.016197] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 [ 0.016212] Built 1 zonelists, mobility grouping on. Total pages: 4117806 [ 0.016214] Kernel command line: initrd=\initrd.img panic=-1 nr_cpus=16 swiotlb=force pty.legacy_count=0 [ 0.018810] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear) [ 0.019993] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.020038] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.036796] Memory: 4094128K/16734844K available (16403K kernel code, 2459K rwdata, 3464K rodata, 1444K init, 1164K bss, 388996K reserved, 0K cma-reserved) [ 0.036832] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1 [ 0.036840] ftrace: allocating 49613 entries in 194 pages [ 0.048726] ftrace: allocated 194 pages with 3 groups [ 0.048929] rcu: Hierarchical RCU implementation. [ 0.048930] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16. [ 0.048931] Rude variant of Tasks RCU enabled. [ 0.048931] Tracing variant of Tasks RCU enabled. [ 0.048931] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. [ 0.048932] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16 [ 0.051184] Using NULL legacy PIC [ 0.051186] NR_IRQS: 16640, nr_irqs: 552, preallocated irqs: 0 [ 0.051565] random: crng done (trusting CPU's manufacturer) [ 0.051585] Console: colour dummy device 80x25 [ 0.051591] printk: console [tty0] enabled [ 0.051595] ACPI: Core revision 20200925 [ 0.051693] Failed to register legacy timer interrupt [ 0.051694] APIC: Switch to symmetric I/O mode setup [ 0.051695] Switched APIC routing to physical flat. [ 0.051850] Hyper-V: Using IPI hypercalls [ 0.051851] Hyper-V: Using enlightened APIC (xapic mode) [ 0.051922] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x706eb0792cc, max_idle_ns: 881591209130 ns [ 0.051925] Calibrating delay loop (skipped), value calculated using timer frequency.. 7799.99 BogoMIPS (lpj=38999970) [ 0.051926] pid_max: default: 32768 minimum: 301 [ 0.051936] LSM: Security Framework initializing [ 0.051958] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.051977] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.052150] x86/cpu: User Mode Instruction Prevention (UMIP) activated [ 0.052167] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 512 [ 0.052168] Last level dTLB entries: 4KB 2048, 2MB 2048, 4MB 1024, 1GB 0 [ 0.052170] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [ 0.052170] Spectre V2 : Mitigation: Full AMD retpoline [ 0.052171] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch [ 0.052172] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier [ 0.052172] Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl [ 0.052173] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 0.052292] Freeing SMP alternatives memory: 52K [ 0.052344] smpboot: CPU0: AMD Ryzen 7 3800X 8-Core Processor (family: 0x17, model: 0x71, stepping: 0x0) [ 0.052403] Performance Events: PMU not available due to virtualization, using software events only. [ 0.052423] rcu: Hierarchical SRCU implementation. [ 0.052753] smp: Bringing up secondary CPUs ... [ 0.052800] x86: Booting SMP configuration: [ 0.052801] .... node #0, CPUs: NVIDIA/nvidia-docker#1 NVIDIA/nvidia-docker#2 NVIDIA/nvidia-docker#3 NVIDIA/nvidia-docker#4 NVIDIA/nvidia-docker#5 NVIDIA/nvidia-docker#6 NVIDIA/nvidia-docker#7 NVIDIA/nvidia-docker#8 NVIDIA/nvidia-docker#9 NVIDIA/nvidia-docker#10 NVIDIA/nvidia-docker#11 NVIDIA/nvidia-docker#12 NVIDIA/nvidia-docker#13 NVIDIA/nvidia-docker#14 NVIDIA/nvidia-docker#15 [ 0.053300] smp: Brought up 1 node, 16 CPUs [ 0.053300] smpboot: Max logical packages: 1 [ 0.053300] smpboot: Total of 16 processors activated (124799.90 BogoMIPS) [ 0.073395] node 0 deferred pages initialised in 10ms [ 0.075402] devtmpfs: initialized [ 0.075402] x86/mm: Memory block size: 128MB [ 0.075402] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.075402] futex hash table entries: 4096 (order: 6, 262144 bytes, linear) [ 0.075402] NET: Registered protocol family 16 [ 0.075402] thermal_sys: Registered thermal governor 'step_wise' [ 0.075402] cpuidle: using governor menu [ 0.075402] ACPI: bus type PCI registered [ 0.075402] PCI: Fatal: No config space access function found [ 0.075402] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.075402] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.082164] raid6: skip pq benchmark and using algorithm avx2x4 [ 0.082164] raid6: using avx2x2 recovery algorithm [ 0.082164] ACPI: Added _OSI(Module Device) [ 0.082164] ACPI: Added _OSI(Processor Device) [ 0.082164] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.082164] ACPI: Added _OSI(Processor Aggregator Device) [ 0.082164] ACPI: Added _OSI(Linux-Dell-Video) [ 0.082164] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio) [ 0.082164] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics) [ 0.085313] ACPI: 1 ACPI AML tables successfully acquired and loaded [ 0.086035] ACPI: Interpreter enabled [ 0.086038] ACPI: (supports S0 S5) [ 0.086039] ACPI: Using IOAPIC for interrupt routing [ 0.086046] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.086138] ACPI: Enabled 1 GPEs in block 00 to 0F [ 0.086794] iommu: Default domain type: Translated [ 0.086851] SCSI subsystem initialized [ 0.086881] hv_vmbus: Vmbus version:5.2 [ 0.086881] PCI: Using ACPI for IRQ routing [ 0.086881] PCI: System does not support PCI [ 0.086881] hv_vmbus: Unknown GUID: c376c1c3-d276-48d2-90a9-c04748072c60 [ 0.086881] hv_vmbus: Unknown GUID: 6e382d18-3336-4f4b-acc4-2b7703d4df4a [ 0.086881] clocksource: Switched to clocksource tsc-early [ 0.086881] hv_vmbus: Unknown GUID: dde9cbc0-5060-4436-9448-ea1254a5d177 [ 0.170448] VFS: Disk quotas dquot_6.6.0 [ 0.170458] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.170473] FS-Cache: Loaded [ 0.170496] pnp: PnP ACPI init [ 0.170537] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active) [ 0.170571] pnp: PnP ACPI: found 1 devices [ 0.174903] NET: Registered protocol family 2 [ 0.175138] tcp_listen_portaddr_hash hash table entries: 8192 (order: 5, 131072 bytes, linear) [ 0.175316] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.175416] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.175625] TCP: Hash tables configured (established 131072 bind 65536) [ 0.175649] UDP hash table entries: 8192 (order: 6, 262144 bytes, linear) [ 0.175671] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes, linear) [ 0.175712] NET: Registered protocol family 1 [ 0.176005] RPC: Registered named UNIX socket transport module. [ 0.176006] RPC: Registered udp transport module. [ 0.176007] RPC: Registered tcp transport module. [ 0.176007] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.176009] PCI: CLS 0 bytes, default 64 [ 0.176049] Trying to unpack rootfs image as initramfs... [ 0.176181] Freeing initrd memory: 60K [ 0.176183] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 0.176185] software IO TLB: mapped [mem 0x00000000f4000000-0x00000000f8000000] (64MB) [ 0.177614] kvm: no hardware support [ 0.178295] kvm: Nested Virtualization enabled [ 0.178301] SVM: kvm: Nested Paging enabled [ 0.178301] SVM: Virtual VMLOAD VMSAVE supported [ 0.181019] Initialise system trusted keyrings [ 0.181118] workingset: timestamp_bits=46 max_order=22 bucket_order=0 [ 0.181643] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.182012] NFS: Registering the id_resolver key type [ 0.182019] Key type id_resolver registered [ 0.182019] Key type id_legacy registered [ 0.182021] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [ 0.182442] Key type cifs.idmap registered [ 0.182496] fuse: init (API version 7.32) [ 0.182618] SGI XFS with ACLs, security attributes, realtime, scrub, repair, quota, no debug enabled [ 0.182874] 9p: Installing v9fs 9p2000 file system support [ 0.182880] FS-Cache: Netfs '9p' registered for caching [ 0.182908] FS-Cache: Netfs 'ceph' registered for caching [ 0.182910] ceph: loaded (mds proto 32) [ 0.185420] NET: Registered protocol family 38 [ 0.185422] xor: automatically using best checksumming function avx [ 0.185423] Key type asymmetric registered [ 0.185424] Asymmetric key parser 'x509' registered [ 0.185429] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250) [ 0.186121] hv_vmbus: registering driver hv_pci [ 0.186439] hv_pci b85a1f33-3b6d-4a2b-982d-0ce62be71656: PCI VMBus probing: Using version 0x10003 [ 0.187115] hv_pci b85a1f33-3b6d-4a2b-982d-0ce62be71656: PCI host bridge to bus 3b6d:00 [ 0.187471] pci 3b6d:00:00.0: [1414:008e] type 00 class 0x030200 [ 0.191995] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.192317] Non-volatile memory driver v1.3 [ 0.194890] brd: module loaded [ 0.195604] loop: module loaded [ 0.195630] hv_vmbus: registering driver hv_storvsc [ 0.195949] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. [ 0.195950] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved. [ 0.195962] tun: Universal TUN/TAP device driver, 1.6 [ 0.196041] PPP generic driver version 2.4.2 [ 0.196142] PPP BSD Compression module registered [ 0.196143] PPP Deflate Compression module registered [ 0.196144] PPP MPPE Compression module registered [ 0.196145] NET: Registered protocol family 24 [ 0.196149] hv_vmbus: registering driver hv_netvsc [ 0.196242] VFIO - User Level meta-driver version: 0.3 [ 0.196361] hv_vmbus: registering driver hyperv_keyboard [ 0.196496] rtc_cmos 00:00: RTC can wake from S4 [ 0.196809] scsi host0: storvsc_host_t [ 0.197753] rtc_cmos 00:00: registered as rtc0 [ 0.198038] rtc_cmos 00:00: setting system clock to 2021-10-03T15:03:26 UTC (1633273406) [ 0.198046] rtc_cmos 00:00: alarms up to one month, 114 bytes nvram [ 0.198221] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com [ 0.198335] device-mapper: raid: Loading target version 1.15.1 [ 0.198404] hv_utils: Registering HyperV Utility Driver [ 0.198405] hv_vmbus: registering driver hv_utils [ 0.198429] hv_vmbus: registering driver hv_balloon [ 0.198437] hv_vmbus: registering driver dxgkrnl [ 0.198452] (NULL device ): dxgk: dxg_drv_init Version: 2103 [ 0.198453] hv_utils: cannot register PTP clock: 0 [ 0.198736] hv_balloon: Using Dynamic Memory protocol version 2.0 [ 0.198827] hv_utils: TimeSync IC version 4.0 [ 0.199020] drop_monitor: Initializing network drop monitor service [ 0.199043] Mirror/redirect action on [ 0.199390] Free page reporting enabled [ 0.199392] hv_balloon: Cold memory discard hint enabled [ 0.199630] (NULL device ): dxgk: mmio allocated 9ffe00000 200000000 9ffe00000 bffdfffff [ 0.199802] IPVS: Registered protocols (TCP, UDP) [ 0.199813] IPVS: Connection hash table configured (size=4096, memory=64Kbytes) [ 0.199835] IPVS: ipvs loaded. [ 0.199836] IPVS: [rr] scheduler registered. [ 0.199836] IPVS: [wrr] scheduler registered. [ 0.199836] IPVS: [sh] scheduler registered. [ 0.199864] ipip: IPv4 and MPLS over IPv4 tunneling driver [ 0.201991] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully [ 0.202382] Initializing XFRM netlink socket [ 0.202426] NET: Registered protocol family 10 [ 0.202648] Segment Routing with IPv6 [ 0.203692] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver [ 0.203777] NET: Registered protocol family 17 [ 0.203790] Bridge firewalling registered [ 0.203796] 8021q: 802.1Q VLAN Support v1.8 [ 0.203808] sctp: Hash tables configured (bind 256/256) [ 0.203842] 9pnet: Installing 9P2000 support [ 0.203855] Key type dns_resolver registered [ 0.203863] Key type ceph registered [ 0.203976] libceph: loaded (mon/osd proto 15/24) [ 0.204044] NET: Registered protocol family 40 [ 0.204045] hv_vmbus: registering driver hv_sock [ 0.204071] IPI shorthand broadcast: enabled [ 0.204077] sched_clock: Marking stable (203581151, 453300)->(215942200, -11907749) [ 0.204331] registered taskstats version 1 [ 0.204338] Loading compiled-in X.509 certificates [ 0.204648] Btrfs loaded, crc32c=crc32c-generic [ 0.206255] Freeing unused kernel image (initmem) memory: 1444K [ 0.271961] Write protecting the kernel read-only data: 22528k [ 0.272551] Freeing unused kernel image (text/rodata gap) memory: 2028K [ 0.273043] Freeing unused kernel image (rodata/data gap) memory: 632K [ 0.273048] Run /init as init process [ 0.273048] with arguments: [ 0.273048] /init [ 0.273049] with environment: [ 0.273049] HOME=/ [ 0.273049] TERM=linux [ 0.829032] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 5 [ 0.829421] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 0.830236] sd 0:0:0:0: [sda] 536870912 512-byte logical blocks: (275 GB/256 GiB) [ 0.830238] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 0.830362] sd 0:0:0:0: [sda] Write Protect is off [ 0.830364] sd 0:0:0:0: [sda] Mode Sense: 0f 00 00 00 [ 0.830557] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 0.874243] hv_pci bb4321df-980a-4d21-afdb-589c18527bf9: PCI VMBus probing: Using version 0x10003 [ 0.915773] hv_pci bb4321df-980a-4d21-afdb-589c18527bf9: PCI host bridge to bus 980a:00 [ 0.915775] pci_bus 980a:00: root bus resource [mem 0xbffe00000-0xbffe02fff window] [ 0.916751] pci 980a:00:00.0: [1af4:1049] type 00 class 0x010000 [ 0.917716] pci 980a:00:00.0: reg 0x10: [mem 0xbffe00000-0xbffe00fff 64bit] [ 0.918396] pci 980a:00:00.0: reg 0x18: [mem 0xbffe01000-0xbffe01fff 64bit] [ 0.919017] pci 980a:00:00.0: reg 0x20: [mem 0xbffe02000-0xbffe02fff 64bit] [ 0.922797] pci 980a:00:00.0: BAR 0: assigned [mem 0xbffe00000-0xbffe00fff 64bit] [ 0.923220] pci 980a:00:00.0: BAR 2: assigned [mem 0xbffe01000-0xbffe01fff 64bit] [ 0.923644] pci 980a:00:00.0: BAR 4: assigned [mem 0xbffe02000-0xbffe02fff 64bit] [ 1.116874] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) [ 1.202006] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1.202180] sd 0:0:0:0: [sda] Attached SCSI disk [ 1.251980] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x706eb0792cc, max_idle_ns: 881591209130 ns [ 1.252943] clocksource: Switched to clocksource tsc [ 1.881960] Adding 4194304k swap on /swap/file. Priority:-2 extents:3 across:4210688k [ 3.152119] scsi 0:0:0:1: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 5 [ 3.152455] sd 0:0:0:1: Attached scsi generic sg1 type 0 [ 3.152998] sd 0:0:0:1: [sdb] 536870912 512-byte logical blocks: (275 GB/256 GiB) [ 3.152999] sd 0:0:0:1: [sdb] 4096-byte physical blocks [ 3.153082] sd 0:0:0:1: [sdb] Write Protect is off [ 3.153083] sd 0:0:0:1: [sdb] Mode Sense: 0f 00 00 00 [ 3.153213] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 3.154369] sd 0:0:0:1: [sdb] Attached SCSI disk [ 3.160357] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered [ 3.215983] FS-Cache: Duplicate cookie detected [ 3.215986] FS-Cache: O-cookie c=00000000aa466783 [p=000000006f69fc41 fl=222 nc=0 na=1] [ 3.215987] FS-Cache: O-cookie d=0000000077b88f2e n=00000000cab53c7d [ 3.215987] FS-Cache: O-key=[10] '34323934393337363132' [ 3.215991] FS-Cache: N-cookie c=0000000061e3e253 [p=000000006f69fc41 fl=2 nc=0 na=1] [ 3.215991] FS-Cache: N-cookie d=0000000077b88f2e n=00000000485d5ccb [ 3.215992] FS-Cache: N-key=[10] '34323934393337363132' [ 3.285697] hv_pci d5ce7240-e76a-439c-ad60-bb77c783e7c5: PCI VMBus probing: Using version 0x10003 [ 3.286638] 9pnet_virtio: no channels available for device drvfs [ 3.286641] WARNING: mount: waiting for virtio device... [ 3.325716] hv_pci d5ce7240-e76a-439c-ad60-bb77c783e7c5: PCI host bridge to bus e76a:00 [ 3.325718] pci_bus e76a:00: root bus resource [mem 0xbffe04000-0xbffe06fff window] [ 3.326672] pci e76a:00:00.0: [1af4:1049] type 00 class 0x010000 [ 3.327614] pci e76a:00:00.0: reg 0x10: [mem 0xbffe04000-0xbffe04fff 64bit] [ 3.328222] pci e76a:00:00.0: reg 0x18: [mem 0xbffe05000-0xbffe05fff 64bit] [ 3.328821] pci e76a:00:00.0: reg 0x20: [mem 0xbffe06000-0xbffe06fff 64bit] [ 3.332517] pci e76a:00:00.0: BAR 0: assigned [mem 0xbffe04000-0xbffe04fff 64bit] [ 3.333024] pci e76a:00:00.0: BAR 2: assigned [mem 0xbffe05000-0xbffe05fff 64bit] [ 3.333449] pci e76a:00:00.0: BAR 4: assigned [mem 0xbffe06000-0xbffe06fff 64bit] [ 3.390415] hv_pci 3f8e3335-82c2-499f-8995-e1c33b9178df: PCI VMBus probing: Using version 0x10003 [ 3.391719] 9pnet_virtio: no channels available for device drvfs [ 3.391721] WARNING: mount: waiting for virtio device... [ 3.430257] hv_pci 3f8e3335-82c2-499f-8995-e1c33b9178df: PCI host bridge to bus 82c2:00 [ 3.430259] pci_bus 82c2:00: root bus resource [mem 0xbffe08000-0xbffe0afff window] [ 3.431241] pci 82c2:00:00.0: [1af4:1049] type 00 class 0x010000 [ 3.432187] pci 82c2:00:00.0: reg 0x10: [mem 0xbffe08000-0xbffe08fff 64bit] [ 3.432796] pci 82c2:00:00.0: reg 0x18: [mem 0xbffe09000-0xbffe09fff 64bit] [ 3.433396] pci 82c2:00:00.0: reg 0x20: [mem 0xbffe0a000-0xbffe0afff 64bit] [ 3.437087] pci 82c2:00:00.0: BAR 0: assigned [mem 0xbffe08000-0xbffe08fff 64bit] [ 3.437505] pci 82c2:00:00.0: BAR 2: assigned [mem 0xbffe09000-0xbffe09fff 64bit] [ 3.437940] pci 82c2:00:00.0: BAR 4: assigned [mem 0xbffe0a000-0xbffe0afff 64bit] [ 3.495623] hv_pci 1b1a11d5-ded9-4bdc-b728-16a6ce447102: PCI VMBus probing: Using version 0x10003 [ 3.536074] hv_pci 1b1a11d5-ded9-4bdc-b728-16a6ce447102: PCI host bridge to bus ded9:00 [ 3.536076] pci_bus ded9:00: root bus resource [mem 0xbffe0c000-0xbffe0efff window] [ 3.537089] pci ded9:00:00.0: [1af4:1049] type 00 class 0x010000 [ 3.537996] pci ded9:00:00.0: reg 0x10: [mem 0xbffe0c000-0xbffe0cfff 64bit] [ 3.538600] pci ded9:00:00.0: reg 0x18: [mem 0xbffe0d000-0xbffe0dfff 64bit] [ 3.539322] pci ded9:00:00.0: reg 0x20: [mem 0xbffe0e000-0xbffe0efff 64bit] [ 3.543300] pci ded9:00:00.0: BAR 0: assigned [mem 0xbffe0c000-0xbffe0cfff 64bit] [ 3.543740] pci ded9:00:00.0: BAR 2: assigned [mem 0xbffe0d000-0xbffe0dfff 64bit] [ 3.544177] pci ded9:00:00.0: BAR 4: assigned [mem 0xbffe0e000-0xbffe0efff 64bit] [ 49.061594] hv_balloon: Max. dynamic memory size: 16344 MB [ 71.292198] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [ 8678.849099] docker0: port 1(veth07ad0a7) entered blocking state [ 8678.849101] docker0: port 1(veth07ad0a7) entered disabled state [ 8678.849121] device veth07ad0a7 entered promiscuous mode [ 8678.849150] docker0: port 1(veth07ad0a7) entered blocking state [ 8678.849151] docker0: port 1(veth07ad0a7) entered forwarding state [ 8678.849472] docker0: port 1(veth07ad0a7) entered disabled state [ 8678.990265] cgroup: runc (5415) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future. [ 8678.990266] cgroup: "memory" requires setting use_hierarchy to 1 on the root [ 8678.990549] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation [ 8679.419693] eth0: renamed from veth984197c [ 8679.459677] IPv6: ADDRCONF(NETDEV_CHANGE): veth07ad0a7: link becomes ready [ 8679.459697] docker0: port 1(veth07ad0a7) entered blocking state [ 8679.459697] docker0: port 1(veth07ad0a7) entered forwarding state [ 8679.459722] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready [ 8680.288430] veth984197c: renamed from eth0 [ 8680.349650] docker0: port 1(veth07ad0a7) entered disabled state [ 8680.445249] docker0: port 1(veth07ad0a7) entered disabled state [ 8680.445930] device veth07ad0a7 left promiscuous mode [ 8680.445948] docker0: port 1(veth07ad0a7) entered disabled state [ 8871.582213] docker0: port 1(veth2124c65) entered blocking state [ 8871.582215] docker0: port 1(veth2124c65) entered disabled state [ 8871.582233] device veth2124c65 entered promiscuous mode [ 8872.129587] eth0: renamed from veth99f60f2 [ 8872.189745] IPv6: ADDRCONF(NETDEV_CHANGE): veth2124c65: link becomes ready [ 8872.189767] docker0: port 1(veth2124c65) entered blocking state [ 8872.189768] docker0: port 1(veth2124c65) entered forwarding state [ 9039.653247] process 'local/cuda-10.2/bin/ptxas' started with executable stack [ 9387.169252] docker0: port 2(veth0ab1b19) entered blocking state [ 9387.169254] docker0: port 2(veth0ab1b19) entered disabled state [ 9387.169274] device veth0ab1b19 entered promiscuous mode [ 9387.169302] docker0: port 2(veth0ab1b19) entered blocking state [ 9387.169302] docker0: port 2(veth0ab1b19) entered forwarding state [ 9387.169669] docker0: port 2(veth0ab1b19) entered disabled state [ 9387.657707] docker0: port 2(veth0ab1b19) entered disabled state [ 9387.657920] device veth0ab1b19 left promiscuous mode [ 9387.657937] docker0: port 2(veth0ab1b19) entered disabled state [ 9417.075476] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead. [40931.406310] docker0: port 2(veth8968728) entered blocking state [40931.406311] docker0: port 2(veth8968728) entered disabled state [40931.406330] device veth8968728 entered promiscuous mode [40931.780035] eth0: renamed from veth8b0ae09 [40931.840207] IPv6: ADDRCONF(NETDEV_CHANGE): veth8968728: link becomes ready [40931.840231] docker0: port 2(veth8968728) entered blocking state [40931.840232] docker0: port 2(veth8968728) entered forwarding state [41888.847459] docker0: port 1(veth2124c65) entered disabled state [41888.847547] veth99f60f2: renamed from eth0 [41888.994901] docker0: port 1(veth2124c65) entered disabled state [41888.995012] device veth2124c65 left promiscuous mode [41888.995014] docker0: port 1(veth2124c65) entered disabled state [41899.075265] docker0: port 2(veth8968728) entered disabled state [41899.075320] veth8b0ae09: renamed from eth0 [41899.195126] docker0: port 2(veth8968728) entered disabled state [41899.195201] device veth8968728 left promiscuous mode [41899.195202] docker0: port 2(veth8968728) entered disabled state [44983.095711] docker0: port 1(veth579ec1c) entered blocking state [44983.095713] docker0: port 1(veth579ec1c) entered disabled state [44983.095767] device veth579ec1c entered promiscuous mode [44983.095802] docker0: port 1(veth579ec1c) entered blocking state [44983.095803] docker0: port 1(veth579ec1c) entered forwarding state [44983.096169] docker0: port 1(veth579ec1c) entered disabled state [44983.558932] eth0: renamed from vethe31675b [44983.609007] IPv6: ADDRCONF(NETDEV_CHANGE): veth579ec1c: link becomes ready [44983.609031] docker0: port 1(veth579ec1c) entered blocking state [44983.609032] docker0: port 1(veth579ec1c) entered forwarding state [48140.938717] docker0: port 2(vethe31a522) entered blocking state [48140.938720] docker0: port 2(vethe31a522) entered disabled state [48140.938783] device vethe31a522 entered promiscuous mode [48140.938815] docker0: port 2(vethe31a522) entered blocking state [48140.938815] docker0: port 2(vethe31a522) entered forwarding state [48140.939141] docker0: port 2(vethe31a522) entered disabled state [48140.953626] docker0: port 2(vethe31a522) entered disabled state [48140.953890] device vethe31a522 left promiscuous mode [48140.953910] docker0: port 2(vethe31a522) entered disabled state [48163.430815] docker0: port 2(veth0cd2f65) entered blocking state [48163.430817] docker0: port 2(veth0cd2f65) entered disabled state [48163.430836] device veth0cd2f65 entered promiscuous mode [48164.076307] docker0: port 2(veth0cd2f65) entered disabled state [48164.076630] device veth0cd2f65 left promiscuous mode [48164.076652] docker0: port 2(veth0cd2f65) entered disabled state [48359.265419] docker0: port 2(veth87ce69e) entered blocking state [48359.265420] docker0: port 2(veth87ce69e) entered disabled state [48359.265439] device veth87ce69e entered promiscuous mode [48359.265464] docker0: port 2(veth87ce69e) entered blocking state [48359.265465] docker0: port 2(veth87ce69e) entered forwarding state [48359.265939] docker0: port 2(veth87ce69e) entered disabled state [48359.975849] docker0: port 2(veth87ce69e) entered disabled state [48359.975930] device veth87ce69e left promiscuous mode [48359.975932] docker0: port 2(veth87ce69e) entered disabled state [63661.051609] docker0: port 2(veth2a489c7) entered blocking state [63661.051611] docker0: port 2(veth2a489c7) entered disabled state [63661.051692] device veth2a489c7 entered promiscuous mode [63661.051745] docker0: port 2(veth2a489c7) entered blocking state [63661.051747] docker0: port 2(veth2a489c7) entered forwarding state [63661.052438] docker0: port 2(veth2a489c7) entered disabled state [63661.065926] docker0: port 2(veth2a489c7) entered disabled state [63661.065991] device veth2a489c7 left promiscuous mode [63661.065992] docker0: port 2(veth2a489c7) entered disabled state [63687.006899] docker0: port 2(veth2cbdb00) entered blocking state [63687.006901] docker0: port 2(veth2cbdb00) entered disabled state [63687.006921] device veth2cbdb00 entered promiscuous mode [63687.533240] eth0: renamed from veth869fda5 [63687.613534] IPv6: ADDRCONF(NETDEV_CHANGE): veth2cbdb00: link becomes ready [63687.613555] docker0: port 2(veth2cbdb00) entered blocking state [63687.613556] docker0: port 2(veth2cbdb00) entered forwarding state [63741.561335] docker0: port 3(veth4e13cbd) entered blocking state [63741.561337] docker0: port 3(veth4e13cbd) entered disabled state [63741.561359] device veth4e13cbd entered promiscuous mode [63741.561385] docker0: port 3(veth4e13cbd) entered blocking state [63741.561386] docker0: port 3(veth4e13cbd) entered forwarding state [63741.561689] docker0: port 3(veth4e13cbd) entered disabled state [63742.201594] docker0: port 3(veth4e13cbd) entered disabled state [63742.201696] device veth4e13cbd left promiscuous mode [63742.201697] docker0: port 3(veth4e13cbd) entered disabled state [63945.395071] docker0: port 3(veth5172a71) entered blocking state [63945.395073] docker0: port 3(veth5172a71) entered disabled state [63945.395096] device veth5172a71 entered promiscuous mode [63945.395127] docker0: port 3(veth5172a71) entered blocking state [63945.395127] docker0: port 3(veth5172a71) entered forwarding state [63945.395248] docker0: port 3(veth5172a71) entered disabled state [63946.001462] docker0: port 3(veth5172a71) entered disabled state [63946.001557] device veth5172a71 left promiscuous mode [63946.001558] docker0: port 3(veth5172a71) entered disabled state [63986.856749] docker0: port 2(veth2cbdb00) entered disabled state [63986.856794] veth869fda5: renamed from eth0 [63986.998482] docker0: port 2(veth2cbdb00) entered disabled state [63986.999130] device veth2cbdb00 left promiscuous mode [63986.999133] docker0: port 2(veth2cbdb00) entered disabled state [63987.085545] vethe31675b: renamed from eth0 [63987.213378] docker0: port 1(veth579ec1c) entered disabled state [63987.218358] docker0: port 1(veth579ec1c) entered disabled state [63987.218861] device veth579ec1c left promiscuous mode [63987.218862] docker0: port 1(veth579ec1c) entered disabled state [64786.418297] docker0: port 1(vethead51d5) entered blocking state [64786.418299] docker0: port 1(vethead51d5) entered disabled state [64786.418318] device vethead51d5 entered promiscuous mode [64786.872856] eth0: renamed from veth99b057f [64786.932949] IPv6: ADDRCONF(NETDEV_CHANGE): vethead51d5: link becomes ready [64786.932966] docker0: port 1(vethead51d5) entered blocking state [64786.932967] docker0: port 1(vethead51d5) entered forwarding state [64787.786553] docker0: port 1(vethead51d5) entered disabled state [64787.786605] veth99b057f: renamed from eth0 [64787.948146] docker0: port 1(vethead51d5) entered disabled state [64787.948881] device vethead51d5 left promiscuous mode [64787.948915] docker0: port 1(vethead51d5) entered disabled state [64807.747511] docker0: port 1(vethb24ff9b) entered blocking state [64807.747512] docker0: port 1(vethb24ff9b) entered disabled state [64807.747531] device vethb24ff9b entered promiscuous mode [64807.747561] docker0: port 1(vethb24ff9b) entered blocking state [64807.747562] docker0: port 1(vethb24ff9b) entered forwarding state [64807.747703] docker0: port 1(vethb24ff9b) entered disabled state [64808.132878] eth0: renamed from vetha7cd44c [64808.193099] IPv6: ADDRCONF(NETDEV_CHANGE): vethb24ff9b: link becomes ready [64808.193123] docker0: port 1(vethb24ff9b) entered blocking state [64808.193124] docker0: port 1(vethb24ff9b) entered forwarding state [64809.023917] vetha7cd44c: renamed from eth0 [64809.183134] docker0: port 1(vethb24ff9b) entered disabled state [64809.188226] docker0: port 1(vethb24ff9b) entered disabled state [64809.188767] device vethb24ff9b left promiscuous mode [64809.188769] docker0: port 1(vethb24ff9b) entered disabled state [65194.352051] docker0: port 1(veth9b9311c) entered blocking state [65194.352053] docker0: port 1(veth9b9311c) entered disabled state [65194.352072] device veth9b9311c entered promiscuous mode [65194.352101] docker0: port 1(veth9b9311c) entered blocking state [65194.352102] docker0: port 1(veth9b9311c) entered forwarding state [65194.352424] docker0: port 1(veth9b9311c) entered disabled state [65194.792790] eth0: renamed from veth5d3475b [65194.832884] IPv6: ADDRCONF(NETDEV_CHANGE): veth9b9311c: link becomes ready [65194.832906] docker0: port 1(veth9b9311c) entered blocking state [65194.832907] docker0: port 1(veth9b9311c) entered forwarding state [65195.722684] veth5d3475b: renamed from eth0 [65195.792916] docker0: port 1(veth9b9311c) entered disabled state [65195.878049] docker0: port 1(veth9b9311c) entered disabled state [65195.878715] device veth9b9311c left promiscuous mode [65195.878732] docker0: port 1(veth9b9311c) entered disabled state [66182.663567] scsi 0:0:0:2: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 5 [66182.664300] sd 0:0:0:2: Attached scsi generic sg2 type 0 [66182.664893] sd 0:0:0:2: [sdc] 536870912 512-byte logical blocks: (275 GB/256 GiB) [66182.664894] sd 0:0:0:2: [sdc] 4096-byte physical blocks [66182.664973] sd 0:0:0:2: [sdc] Write Protect is off [66182.664975] sd 0:0:0:2: [sdc] Mode Sense: 0f 00 00 00 [66182.665154] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [66182.666668] sd 0:0:0:2: [sdc] Attached SCSI disk [66182.683579] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered [66187.399545] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered [66187.410623] FS-Cache: Duplicate cookie detected [66187.410624] FS-Cache: O-cookie c=000000004e228525 [p=000000006f69fc41 fl=222 nc=0 na=1] [66187.410625] FS-Cache: O-cookie d=0000000077b88f2e n=0000000013e7c87d [66187.410625] FS-Cache: O-key=[10] '34333031353536303333' [66187.410628] FS-Cache: N-cookie c=000000005b00e07c [p=000000006f69fc41 fl=2 nc=0 na=1] [66187.410629] FS-Cache: N-cookie d=0000000077b88f2e n=00000000f3cfd1ce [66187.410629] FS-Cache: N-key=[10] '34333031353536303333' [66187.676655] hv_pci c689411f-e482-4a4b-b5ec-379303b0c4a9: PCI VMBus probing: Using version 0x10003 [66187.677922] 9pnet_virtio: no channels available for device drvfs [66187.677925] WARNING: mount: waiting for virtio device... [66187.718390] hv_pci c689411f-e482-4a4b-b5ec-379303b0c4a9: PCI host bridge to bus e482:00 [66187.718393] pci_bus e482:00: root bus resource [mem 0xbffe10000-0xbffe12fff window] [66187.719402] pci e482:00:00.0: [1af4:1049] type 00 class 0x010000 [66187.720467] pci e482:00:00.0: reg 0x10: [mem 0xbffe10000-0xbffe10fff 64bit] [66187.721103] pci e482:00:00.0: reg 0x18: [mem 0xbffe11000-0xbffe11fff 64bit] [66187.721739] pci e482:00:00.0: reg 0x20: [mem 0xbffe12000-0xbffe12fff 64bit] [66187.725644] pci e482:00:00.0: BAR 0: assigned [mem 0xbffe10000-0xbffe10fff 64bit] [66187.726097] pci e482:00:00.0: BAR 2: assigned [mem 0xbffe11000-0xbffe11fff 64bit] [66187.726550] pci e482:00:00.0: BAR 4: assigned [mem 0xbffe12000-0xbffe12fff 64bit] [66187.782347] hv_pci 5de0a50d-7985-4767-96bc-4a4a80b94674: PCI VMBus probing: Using version 0x10003 [66187.783556] 9pnet_virtio: no channels available for device drvfs [66187.783561] WARNING: mount: waiting for virtio device... [66187.823210] hv_pci 5de0a50d-7985-4767-96bc-4a4a80b94674: PCI host bridge to bus 7985:00 [66187.823212] pci_bus 7985:00: root bus resource [mem 0xbffe14000-0xbffe16fff window] [66187.824217] pci 7985:00:00.0: [1af4:1049] type 00 class 0x010000 [66187.825200] pci 7985:00:00.0: reg 0x10: [mem 0xbffe14000-0xbffe14fff 64bit] [66187.825847] pci 7985:00:00.0: reg 0x18: [mem 0xbffe15000-0xbffe15fff 64bit] [66187.826493] pci 7985:00:00.0: reg 0x20: [mem 0xbffe16000-0xbffe16fff 64bit] [66187.830371] pci 7985:00:00.0: BAR 0: assigned [mem 0xbffe14000-0xbffe14fff 64bit] [66187.830823] pci 7985:00:00.0: BAR 2: assigned [mem 0xbffe15000-0xbffe15fff 64bit] [66187.831276] pci 7985:00:00.0: BAR 4: assigned [mem 0xbffe16000-0xbffe16fff 64bit] [66187.887658] hv_pci 28ccd863-7f1b-48fb-a06c-14f1032961b1: PCI VMBus probing: Using version 0x10003 [66187.929043] hv_pci 28ccd863-7f1b-48fb-a06c-14f1032961b1: PCI host bridge to bus 7f1b:00 [66187.929046] pci_bus 7f1b:00: root bus resource [mem 0xbffe18000-0xbffe1afff window] [66187.930038] pci 7f1b:00:00.0: [1af4:1049] type 00 class 0x010000 [66187.930989] pci 7f1b:00:00.0: reg 0x10: [mem 0xbffe18000-0xbffe18fff 64bit] [66187.931624] pci 7f1b:00:00.0: reg 0x18: [mem 0xbffe19000-0xbffe19fff 64bit] [66187.932284] pci 7f1b:00:00.0: reg 0x20: [mem 0xbffe1a000-0xbffe1afff 64bit] [66187.936143] pci 7f1b:00:00.0: BAR 0: assigned [mem 0xbffe18000-0xbffe18fff 64bit] [66187.936627] pci 7f1b:00:00.0: BAR 2: assigned [mem 0xbffe19000-0xbffe19fff 64bit] [66187.937076] pci 7f1b:00:00.0: BAR 4: assigned [mem 0xbffe1a000-0xbffe1afff 64bit] [66977.281402] docker0: port 1(veth0d37bc8) entered blocking state [66977.281404] docker0: port 1(veth0d37bc8) entered disabled state [66977.281423] device veth0d37bc8 entered promiscuous mode [66977.281453] docker0: port 1(veth0d37bc8) entered blocking state [66977.281453] docker0: port 1(veth0d37bc8) entered forwarding state [66977.281748] docker0: port 1(veth0d37bc8) entered disabled state [66978.181803] docker0: port 1(veth0d37bc8) entered disabled state [66978.181906] device veth0d37bc8 left promiscuous mode [66978.181907] docker0: port 1(veth0d37bc8) entered disabled state [67557.114920] docker0: port 1(veth9c4371d) entered blocking state [67557.114921] docker0: port 1(veth9c4371d) entered disabled state [67557.114944] device veth9c4371d entered promiscuous mode [67557.652243] eth0: renamed from veth99c3a3f [67557.802389] IPv6: ADDRCONF(NETDEV_CHANGE): veth9c4371d: link becomes ready [67557.802412] docker0: port 1(veth9c4371d) entered blocking state [67557.802413] docker0: port 1(veth9c4371d) entered forwarding state [67558.185775] veth99c3a3f: renamed from eth0 [67558.302350] docker0: port 1(veth9c4371d) entered disabled state [67558.307904] docker0: port 1(veth9c4371d) entered disabled state [67558.308442] device veth9c4371d left promiscuous mode [67558.308444] docker0: port 1(veth9c4371d) entered disabled state [67593.210939] docker0: port 1(veth228dbe2) entered blocking state [67593.210940] docker0: port 1(veth228dbe2) entered disabled state [67593.210960] device veth228dbe2 entered promiscuous mode [67593.210992] docker0: port 1(veth228dbe2) entered blocking state [67593.210993] docker0: port 1(veth228dbe2) entered forwarding state [67593.211282] docker0: port 1(veth228dbe2) entered disabled state [67593.722096] eth0: renamed from veth20ca901 [67593.782236] IPv6: ADDRCONF(NETDEV_CHANGE): veth228dbe2: link becomes ready [67593.782257] docker0: port 1(veth228dbe2) entered blocking state [67593.782258] docker0: port 1(veth228dbe2) entered forwarding state [68424.882867] init: (195) ERROR: operator():211: shutdown failed 107 [68424.884817] init: (195) ERROR: operator():211: shutdown failed 107 [68424.886584] init: (195) ERROR: operator():211: shutdown failed 107 [68424.888302] init: (195) ERROR: operator():211: shutdown failed 107 [68424.890043] init: (195) ERROR: operator():211: shutdown failed 107 [68424.891745] init: (195) ERROR: operator():211: shutdown failed 107 [68424.893473] init: (195) ERROR: operator():211: shutdown failed 107 [68424.896066] init: (195) ERROR: operator():211: shutdown failed 107 [68424.898353] init: (195) ERROR: operator():211: shutdown failed 107 [68424.900144] init: (195) ERROR: operator():211: shutdown failed 107 [68599.829580] init: (195) ERROR: operator():211: shutdown failed 107 [68599.832116] init: (195) ERROR: operator():211: shutdown failed 107 [68599.834452] init: (195) ERROR: operator():211: shutdown failed 107 [68599.836492] init: (195) ERROR: operator():211: shutdown failed 107 [68599.838390] init: (195) ERROR: operator():211: shutdown failed 107 [68599.840152] init: (195) ERROR: operator():211: shutdown failed 107 [68599.841806] init: (195) ERROR: operator():211: shutdown failed 107 [68599.843637] init: (195) ERROR: operator():211: shutdown failed 107 [68599.845480] init: (195) ERROR: operator():211: shutdown failed 107 [68599.847897] init: (195) ERROR: operator():211: shutdown failed 107

Driver information from nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Mon Oct 4 21:42:55 2021 Driver Version : 510.06 CUDA Version : 11.6

Attached GPUs : 1 GPU 00000000:2D:00.0 Product Name : NVIDIA GeForce RTX 2080 Ti Product Brand : GeForce Product Architecture : Turing Display Mode : Enabled Display Active : Enabled Persistence Mode : Enabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : WDDM Pending : WDDM Serial Number : N/A GPU UUID : GPU-4949b172-957c-5479-5dc3-12e0ea688389 Minor Number : N/A VBIOS Version : 90.02.30.00.b7 MultiGPU Board : No Board ID : 0x2d00 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.02.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x2D Device : 0x00 Domain : 0x0000 Device Id : 0x1E0410DE Bus Id : 00000000:2D:00.0 Sub System Id : 0x12AE10DE GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 7000 KB/s Rx Throughput : 221000 KB/s Fan Speed : 0 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 11264 MiB Used : 2840 MiB Free : 8424 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : N/A Memory : N/A Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 47 C GPU Shutdown Temp : 94 C GPU Slowdown Temp : 91 C GPU Max Operating Temp : 89 C GPU Target Temperature : 84 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 20.30 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 280.00 W Clocks Graphics : 387 MHz SM : 387 MHz Memory : 403 MHz Video : 539 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 7000 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : N/A Processes : None

Docker version from docker version

Client: Docker Engine - Community Version: 20.10.8 API version: 1.41 Go version: go1.16.6 Git commit: 3967b7d Built: Fri Jul 30 19:54:27 2021 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.8 API version: 1.41 (minimum version 1.12) Go version: go1.16.6 Git commit: 75249d8 Built: Fri Jul 30 19:52:33 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.10 GitCommit: 8848fdb7c4ae3815afcc990a8a99d663dda1b590 runc: Version: 1.0.2 GitCommit: v1.0.2-0-g52b36a2 docker-init: Version: 0.19.0 GitCommit: de40ad0

NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'

Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================-==========================-============-================================================> un libgldispatch0-nvidia (no description available) un libnvidia-compute (no description available) ii libnvidia-compute-460-server:amd64 460.91.03-0ubuntu0.20.04.1 amd64 NVIDIA libcompute package ii libnvidia-container-tools 1.5.1-1 amd64 NVIDIA container runtime library (command-line t> ii libnvidia-container1:amd64 1.5.1-1 amd64 NVIDIA container runtime library ii libnvidia-ml-dev 10.1.243-3 amd64 NVIDIA Management Library (NVML) development fil> un libnvidia-ml.so.1 (no description available) un libnvidia-ml1 (no description available) un libnvidia-tesla-418-ml1 (no description available) un libnvidia-tesla-440-ml1 (no description available) un libnvidia-tesla-cuda1 (no description available) ii nvidia-container-runtime 3.5.0-1 amd64 NVIDIA container runtime un nvidia-container-runtime-hook (no description available) ii nvidia-container-toolkit 1.5.1-1 amd64 NVIDIA container runtime hook ii nvidia-cuda-dev 10.1.243-3 amd64 NVIDIA CUDA development files ii nvidia-cuda-doc 10.1.243-3 all NVIDIA CUDA and OpenCL documentation ii nvidia-cuda-gdb 10.1.243-3 amd64 NVIDIA CUDA Debugger (GDB) ii nvidia-cuda-toolkit 10.1.243-3 amd64 NVIDIA CUDA development toolkit un nvidia-docker (no description available) ii nvidia-docker2 2.6.0-1 all nvidia-docker CLI wrapper un nvidia-driver (no description available) un nvidia-legacy-304xx-vdpau-driver (no description available) un nvidia-legacy-340xx-vdpau-driver (no description available) un nvidia-libopencl1 (no description available) un nvidia-libopencl1-dev (no description available) ii nvidia-opencl-dev:amd64 10.1.243-3 amd64 NVIDIA OpenCL development files un nvidia-opencl-icd (no description available) ii nvidia-profiler 10.1.243-3 amd64 NVIDIA Profiler for CUDA and OpenCL un nvidia-tesla-418-driver (no description available) un nvidia-tesla-440-driver (no description available) un nvidia-vdpau-driver (no description available) ii nvidia-visual-profiler 10.1.243-3 amd64 NVIDIA Visual Profiler for CUDA and OpenCL

NVIDIA container library version from nvidia-container-cli -V

version: 1.5.1 build date: 2021-09-20T14:30+00:00 build revision: 4afad130c4c253abd3b2db563ffe9331594bda41 build compiler: gcc-5 5.4.0 20160609 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

NVIDIA container library logs

2021/10/04 21:48:07 Using bundle directory: /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/43c805d8ac1895dc62353aa47b2ac77b5a6eb2d7af3a1441658e55abc97fae27 2021/10/04 21:48:07 Using OCI specification file path: /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/43c805d8ac1895dc62353aa47b2ac77b5a6eb2d7af3a1441658e55abc97fae27/config.json 2021/10/04 21:48:07 Looking for runtime binary 'docker-runc' 2021/10/04 21:48:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH 2021/10/04 21:48:07 Looking for runtime binary 'runc' 2021/10/04 21:48:07 Found runtime binary '/bin/runc' 2021/10/04 21:48:07 Running nvidia-container-runtime

2021/10/04 21:48:07 'create' command detected; modification required 2021/10/04 21:48:07 prestart hook path: /bin/nvidia-container-runtime-hook

2021/10/04 21:48:07 existing nvidia prestart hook in OCI spec file 2021/10/04 21:48:07 Forwarding command to runtime

PQLLUX commented 3 years ago
  1. you don't need any nvidia drivers inside wsl2 installation (docs)
  2. docker desktop isn't officially supported
  3. if you plan on using cuda, install cuda-toolkit package As to why some images are working and some aren't idk, I don't work for nvidia
Mihawk2022 commented 3 years ago

hi @PQLLUX , I reinstalled all things yesterday (follow the guide), but still encountered with the same problem. I just update the issue info, any help would be appreciated!

YuzhouPeng commented 3 years ago

Me too! I also have the same problem. I will show you screenshots about this. system: windows11 wsl2 ubuntu18.04 docker:window docker-desktop(wsl2) nvidia-driver: 510 cuda:11.6 Error info: 屏幕截图 2021-10-05 142404

docker version: 屏幕截图 2021-10-05 142443

Can you give me some advice? Thank you

YuzhouPeng commented 3 years ago

and my nvcc -V cuda version is 11. 屏幕截图 2021-10-05 145808

PQLLUX commented 3 years ago

@Mihawk2020 You still have unnecessary nvidia libraries (eg. libnvidia-compute-460-server) which might cause said errors. If I were you, I'd check /usr/lib/x86_64-linux-gnu/ for any nvidia .so files. Here's list of nvidia libraries inside my /usr/lib/x86_64-linux-gnu/:

/usr/lib/x86_64-linux-gnu                                                                                      10:25:33
➜ ls -la | grep nvidia
lrwxrwxrwx  1 root root         28 Sep 20 15:24 libnvidia-container.so.1 -> libnvidia-container.so.1.5.1
-rwxr-xr-x  1 root root     179216 Sep 20 15:24 libnvidia-container.so.1.5.1

Moreover, you've probably installed nvidia-cuda-toolkit instead of cuda-toolkit- they're not the same. Additionally, you can try updating wsl kernel - you're not using the latest version. I'm also attaching my nvidia-docker related libraries in my wsl2 distro:

➜ dpkg -l | grep nvidia
ii  libnvidia-container-tools       1.5.1-1                               amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64      1.5.1-1                               amd64        NVIDIA container runtime library
ii  nvidia-container-runtime        3.5.0-1                               amd64        NVIDIA container runtime
ii  nvidia-container-toolkit        1.5.1-1                               amd64        NVIDIA container runtime hook
ii  nvidia-docker2                  2.6.0-1                               all          nvidia-docker CLI wrapper

If you need logs to compare here're mine: NVIDIA/nvidia-docker#1548 FInally, can you attach your dockerfile? I can build your image locally and see if I can reproduce the problem.

Mihawk2022 commented 3 years ago

@PQLLUX Thx for your reply!

  1. Those nvidia libraries seem not releated to my problem

    • I also prepare another ubuntu1804 environment, in which I follow the guide but without installing CUDA Toolkit (WSL-Ubuntu).
    • /usr/lib/x86_64-linux-gnu/ and sudo apt list --installed information about nvidia as follows: image
    • However, when I run my image, I encounter the same libnvidia-ml.so.1 problem.
  2. I installed both cuda-toolkit (follow the guide) and nvidia-cuda-toolkit (wanted to see nvcc -V). And libnvidia-compute-460-server is installed automatically with nvidia-cuda-toolkit.

  3. about updateing wsl kernel: do you mean download the newest and reinstall wsl_update_x64.msi? I just tried and finally found wsl cat /proc/version still is Linux version 5.10.16.3-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) NVIDIA/nvidia-docker#1 SMP Fri Apr 2 22:23:49 UTC 2021.

  4. BTW maybe do you know how to execute docker run step by step? like build a debug environment to see in which step it goes wrong

YuzhouPeng commented 3 years ago

@Mihawk2020 @PQLLUX I also have the same problem, have you find any solution?

Mihawk2022 commented 3 years ago

@YuzhouPeng Not yet, later today I will go on checking

elezar commented 3 years ago

@Mihawk2020 does this happen for all images or a specific one? If you start a container from the image without GPU support, does the /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 file already exist?

PQLLUX commented 3 years ago

@Mihawk2020

  1. Fair enough, I wanted to rule out the possibility of "non-wsl" libraries interference
  2. On a side note, to check cuda version you just have to add
    export PATH="/usr/local/cuda/bin:$PATH"
    export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

    to .bashrc (or whatever your shell is) and source the file, nvcc --version will work

  3. you can update wsl kernel via windows update > advanced options > receive updates for other microsoft products
  4. If it's your image, you've pulled it from a registry/have its Dockerfile, can you post it?
  5. To "debug" a container I'd use docker events&, run it in a new tab and then use docker logs <hash>, where <hash> is hex id from docker events& output, such as
    2021-10-07T17:14:09.852224531+02:00 container create xxxxxxx1bf431b250d6

    but I'm not sure it'll provide any meaningful information, but you can try

Mihawk2022 commented 3 years ago

@Mihawk2020 does this happen for all images or a specific one? If you start a container from the image without GPU support, does the /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 file already exist?

@elezar Hi, it seems to happen only for my own image. And if I remove --gpus all --runtime=nvidia, the docker run dosen't give error. PS: when I execute sudo docker run --gpus all --runtime=nvidia nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark, everything goes OK image

elezar commented 3 years ago

This seems to indicate that the file /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 already exists in the image. You could confirm this by running:

sudo docker run --rm -ti <image> ls -al /usr/lib/x86_64-linux-gnu/libnv\*

How was this image generated? Are you able to "clean it up" and remove the symlinks such as the one above?

klueska commented 3 years ago

Having these „ghost“ libraries in your image is most often the result of building the image with nvidia set as the default runtime in docker. Building MUST be done without nvidia set as the runtime.

Mihawk2022 commented 3 years ago

@Mihawk2020 does this happen for all images or a specific one? If you start a container from the image without GPU support, does the /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 file already exist?

@elezar Thx a lot, libnvidia-ml.so.1 really exists in the image. ls -al /usr/lib/x86_64-linux-gnu/libnv*shows image

I noticed that someone said there shouldn't has any nvidia drivers in the image, so I checked apt list --installed after entering the container without --gpus all --runtime=nvidia options. It shows there isn't any nvidia or libnvidia package, only have some cuda related packages (cuda-compat-10-2, cuda-cudart-10-2, cuda-license-10-2) At that time, I thought that means no nvidia driver installed in my image and had no idea about symlinks problem.

I'm not sure how this image was generated, I just pull it down to use. Could you please tell me how to "clean it up" and remove those symlink? Appreciate it

PS: not sure why that: this is the sudo docker run --rm -it <image> ls -al /usr/lib/x86_64-linux-gnu/libnv* output image

klueska commented 3 years ago

You need to both delete them AND unmount them as a final step in the image build. Which means you will need to generate a new image from the one you download where you perform these steps. I forget which order the delete / unmount needs to happen in, but both operations must be performed.

Mihawk2022 commented 3 years ago

You need to both delete them AND unmount them as a final step in the image build. Which means you will need to generate a new image from the one you download where you perform these steps. I forget which order the delete / unmount needs to happen in, but both operations must be performed.

I'm afraid I cannot rebuild the image by myself, it depends on another image which I have no access. Is there any possible to unmount those nvidia-related files just based on the current image?

klueska commented 3 years ago

My point is that you can't just remove them from a running container. They need to be not present in the image before the nvidia runtime sees it and attempts to inject its own set of files in there at container startup.

You don't need to rebuild the image, just build a new image that is based on the image you want, but removes the unwanted ghost files. I.e. something like:

FROM <the image you care about>
RUN umount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 && \
    rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
...

and (of course) make sure you don't build this new image with nvidia set as the default runtime in your daemon.json file.

Mihawk2022 commented 3 years ago

@klueska Really appreciate! Now I new a Dockerfile and try to build it. image Do you know how to give superuser when building an image? I only find people says how to deal with running image: using docker run --privileged

Finally, I tried this:

  1. docker run --privileged the image
  2. then execute unmount & rm to get rid of libnvidia and libcuda files
  3. then docker commit to save a new image
  4. when I run this new image with --gpus all --runtime=nvidia options, it doesn't give error any more
UsherWang commented 2 years ago

@Mihawk2020 Hi Mihawk I was facing the same problem. By following your suggestion I can eliminate the error from my side too. However I find that the code doesn't use gpu at all after eliminate the error. Just wondering if you are in the same situation or not...

JustASquid commented 2 years ago

This is a problem for me as well as I'm trying to run the image nvidia/vulkan:1.2.170-470 in WSL2. The problematic libraries are installed as part of the base image here as well; if I try the approach suggested here about deleting the nvidia and cuda libraries from the container, it crashes on startup.

ywz978020607 commented 2 years ago

Me too! I also have the same problem. I will show you screenshots about this. system: windows11 wsl2 ubuntu18.04 docker:window docker-desktop(wsl2) nvidia-driver: 510 cuda:11.6 Error info: 屏幕截图 2021-10-05 142404

docker version: 屏幕截图 2021-10-05 142443

Can you give me some advice? Thank you

the same with you!

gngenius02 commented 2 years ago

My point is that you can't just remove them from a running container. They need to be not present in the image before the nvidia runtime sees it and attempts to inject its own set of files in there at container startup.

You don't need to rebuild the image, just build a new image that is based on the image you want, but removes the unwanted ghost files. I.e. something like:

FROM <the image you care about>
RUN umount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 && \
    rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
...

and (of course) make sure you don't build this new image with nvidia set as the default runtime in your daemon.json file.

Thanks for the help, I was able to do this on my image without the umount command. I also had 2 files throwing errors. So my final command looked something like this. Hope this helps someone else.

FROM <the image you care about>

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1
rainabba commented 2 years ago

@gngenius02 Permission denied when I try to rm the files. I'm working FROM xychelsea/ffmpeg-nvidia:v0.4.1 I used the following post-build to modify my image.

CONTAINER_ID=$(docker run -d --name myimage --rm myimage sleep 60)
docker exec -u 0 $CONTAINER_ID rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
docker exec -u 0 $CONTAINER_ID rm -rf /usr/lib/x86_64-linux-gnu/libcuda.so.1
docker commit --message "remove nvidia drivers that will conflict in wsl2+ubuntu+docker+nvidia" $CONTAINER_ID myimage
msseibel commented 2 years ago

As a short warning: Trying to build an image with privileged rights (as it is necessary for the umount) will bring you down a rabbit hole. Good luck: https://stackoverflow.com/questions/48098671/build-with-docker-and-privileged

I would be happy to see someone succeed using the umount command in the Dockerfile.

zhqi77 commented 2 years ago

My point is that you can't just remove them from a running container. They need to be not present in the image before the nvidia runtime sees it and attempts to inject its own set of files in there at container startup. You don't need to rebuild the image, just build a new image that is based on the image you want, but removes the unwanted ghost files. I.e. something like:

FROM <the image you care about>
RUN umount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 && \
    rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
...

and (of course) make sure you don't build this new image with nvidia set as the default runtime in your daemon.json file.

Thanks for the help, I was able to do this on my image without the umount command. I also had 2 files throwing errors. So my final command looked something like this. Hope this helps someone else.

FROM <the image you care about>

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1

works perfectly for my case, w11 wsl2. thanks a lot!!!

zengzhengrong commented 2 years ago

https://github.com/k0sproject/k0s/issues/2150

tigert1998 commented 1 year ago

This my DockerFile:

FROM tigertang1128/ftenv:latest

RUN rm -rf /usr/lib/x86_64-linux-gnu/libcuda.so.1 \
    /usr/lib/x86_64-linux-gnu/libnvidia-*.so.1 \
    /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
rtarquini commented 1 year ago

Would it be possible to have the container detect that you are running in Docker Desktop and delete the libs in the entrypoint processing? That way you wouldn't have separate images for windows/linux. Or are the libraries already loaded by that point?

elezar commented 1 year ago

Would it be possible to have the container detect that you are running in Docker Desktop and delete the libs in the entrypoint processing? That way you wouldn't have separate images for windows/linux. Or are the libraries already loaded by that point?

The images should be portable, with the mounting of the User-mode Driver files taken care of by the NVIDIA Container Toolkit. The issue is that the image was built with the NVIDIA Container Runtime, which results in these files regardless of where it was built. It is my understanding that Docker Desktop on WSL2 sets the runtime as the default. I am anaware of a mechanism to disable this.

rtarquini commented 1 year ago

Wanted the same image to work on a Linux host and Windows host with the GPU enabled. So believe I need the NVIDIA runtime to be enabled in both cases. Wanted to remove the libs when running under windows only. I will give it a try.

elezar commented 1 year ago

@rtarquini I would suggest building the image on the linux host and making sure that the nvidia runtime is not used in this case. (You can confirm that nvidia is not set as the default_runtime in /etc/docker/daemon.json). This image should then be usable on any platform with the NVIDIA Container Toolkit installed and the nvidia runtime configured.

It should not even be required to have access to GPU hardware to build the image, so this can be done on any linux system. The CUDA base images include stubs for the driver libraries and these are used at build time, and the libraries injected by the NVIDIA Container Runtime are used at runtime.

There may be some applications that break this assumption, depending on driver functionality at build time. This should in most cases be considered a bug and the dependency on the driver directly should be removed. One issue is that building against a specific driver version may break portability of images.

rtarquini commented 1 year ago

The docker build process requires the GPU in my image. (Building CUDA libraries), so I believe I need the nvidia runtime.

elezar commented 1 year ago

The docker build process requires the GPU in my image. (Building CUDA libraries), so I believe I need the nvidia runtime.

As a matter of interest, why does it need the GPU? If you use the -devel- images, the CUDA toolking is available at build time and the included stub libraries allows linking to complete successfuly. The driver libraries are then injected by the NVIDIA Container Toolkit when the container is run.

tobinbc commented 1 year ago

My point is that you can't just remove them from a running container. They need to be not present in the image before the nvidia runtime sees it and attempts to inject its own set of files in there at container startup. You don't need to rebuild the image, just build a new image that is based on the image you want, but removes the unwanted ghost files. I.e. something like:

FROM <the image you care about>
RUN umount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 && \
    rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
...

and (of course) make sure you don't build this new image with nvidia set as the default runtime in your daemon.json file.

Thanks for the help, I was able to do this on my image without the umount command. I also had 2 files throwing errors. So my final command looked something like this. Hope this helps someone else.

FROM <the image you care about>

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1

I had to change a bit:

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1

All good now!

huangpan2507 commented 10 months ago

sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras sudo rm -rf /var/lib/docker sudo rm -rf /var/lib/containerd for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo systemctl restart docker

https://github.com/NVIDIA/nvidia-container-toolkit/issues/154#issuecomment-1845996091
the issues/154 helps me, I don't delete the nvidia-runtime in daemon.json. But please note!!!!, these commands above will delete all the images you had downloaded!