Docker with gpu support needs nvidia-container-toolkit. When I tried to install, it wouldn't work.
2. Steps to reproduce the issue
nvidia-smi with gpu supported docker throws:
$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
So according to this, when I try to install nvidia-container-toolkit.
Loaded plugins: fastestmirror, langpacks, nvidia, product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Repository base is listed more than once in the configuration
Repository updates is listed more than once in the configuration
Repository extras is listed more than once in the configuration
Repository libnvidia-container is listed more than once in the configuration
Repository libnvidia-container-experimental is listed more than once in the configuration
Loading mirror speeds from cached hostfile
* base: mirror.metrocast.net
* extras: mirror.es.its.nyu.edu
* updates: mirror.atlanticmetro.net
base | 3.6 kB 00:00:00
docker-ce-stable | 3.5 kB 00:00:00
extras | 2.9 kB 00:00:00
libnvidia-container/x86_64/signature | 488 B 00:00:00
libnvidia-container/x86_64/signature | 2.1 kB 00:00:00 !!!
nvidia-container-runtime/x86_64/signature | 488 B 00:00:00
Retrieving key from https://nvidia.github.io/nvidia-container-runtime/gpgkey
nvidia-container-runtime/x86_64/signature | 2.1 kB 00:00:00 !!!
https://nvidia.github.io/nvidia-container-runtime/stable/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-container-runtime
Trying other mirror.
One of the configured repositories failed (nvidia-container-runtime),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=nvidia-container-runtime ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable nvidia-container-runtime
or
subscription-manager repos --disable=nvidia-container-runtime
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=nvidia-container-runtime.skip_if_unavailable=true
failure: repodata/repomd.xml from nvidia-container-runtime: [Errno 256] No more mirrors to try.
https://nvidia.github.io/nvidia-container-runtime/stable/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-container-runtime
3. Information to attach (optional if deemed irrelevant)
[X] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info:
-- WARNING, the following logs are for debugging purposes only --
I1113 13:02:15.875066 7361 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15)
I1113 13:02:15.875243 7361 nvc.c:256] using root /
I1113 13:02:15.875249 7361 nvc.c:257] using ldcache /etc/ld.so.cache
I1113 13:02:15.875254 7361 nvc.c:258] using unprivileged user 1002:1002
I1113 13:02:15.875367 7361 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1113 13:02:15.875492 7361 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W1113 13:02:15.882503 7363 nvc.c:187] failed to set inheritable capabilities
W1113 13:02:15.882557 7363 nvc.c:188] skipping kernel modules load due to failure
I1113 13:02:15.883106 7364 driver.c:101] starting driver service
I1113 13:02:17.306044 7361 nvc_info.c:680] requesting driver information with ''
I1113 13:02:17.308247 7361 nvc_info.c:169] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.440.33.01
I1113 13:02:17.311384 7361 nvc_info.c:169] selecting /usr/lib64/libnvoptix.so.440.33.01
I1113 13:02:17.311600 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-tls.so.440.33.01
I1113 13:02:17.312427 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-rtcore.so.440.33.01
I1113 13:02:17.315055 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.440.33.01
I1113 13:02:17.316776 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-opticalflow.so.440.33.01
I1113 13:02:17.317677 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-opencl.so.440.33.01
I1113 13:02:17.317735 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-ml.so.440.33.01
[X] Kernel version from uname -a:
Linux AZICT00001 3.10.0-1160.2.2.el7.x86_64 #1 SMP Tue Oct 20 16:53:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
1. Issue or feature description
Docker with gpu support needs
nvidia-container-toolkit
. When I tried to install, it wouldn't work.2. Steps to reproduce the issue
nvidia-smi
with gpu supported docker throws:So according to this, when I try to install
nvidia-container-toolkit
.3. Information to attach (optional if deemed irrelevant)
[X] Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
:-- WARNING, the following logs are for debugging purposes only --
I1113 13:02:15.875066 7361 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15) I1113 13:02:15.875243 7361 nvc.c:256] using root / I1113 13:02:15.875249 7361 nvc.c:257] using ldcache /etc/ld.so.cache I1113 13:02:15.875254 7361 nvc.c:258] using unprivileged user 1002:1002 I1113 13:02:15.875367 7361 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I1113 13:02:15.875492 7361 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment W1113 13:02:15.882503 7363 nvc.c:187] failed to set inheritable capabilities W1113 13:02:15.882557 7363 nvc.c:188] skipping kernel modules load due to failure I1113 13:02:15.883106 7364 driver.c:101] starting driver service I1113 13:02:17.306044 7361 nvc_info.c:680] requesting driver information with '' I1113 13:02:17.308247 7361 nvc_info.c:169] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.440.33.01 I1113 13:02:17.311384 7361 nvc_info.c:169] selecting /usr/lib64/libnvoptix.so.440.33.01 I1113 13:02:17.311600 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-tls.so.440.33.01 I1113 13:02:17.312427 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-rtcore.so.440.33.01 I1113 13:02:17.315055 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.440.33.01 I1113 13:02:17.316776 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-opticalflow.so.440.33.01 I1113 13:02:17.317677 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-opencl.so.440.33.01 I1113 13:02:17.317735 7361 nvc_info.c:169] selecting /usr/lib64/libnvidia-ml.so.440.33.01
uname -a
: Linux AZICT00001 3.10.0-1160.2.2.el7.x86_64 #1 SMP Tue Oct 20 16:53:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux[X] Any relevant kernel output lines from
dmesg
[ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-1160.2.2.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Oct 20 16:53:08 UTC 2020 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.2.2.el7.x86_64 root=UUID=9d77033b-4696-4873-b883-a3124b94db64 ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 net.ifnames=0 LANG=en_US.UTF-8 [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003ffeffff] usable [ 0.000000] BIOS-e820: [mem 0x000000003fff0000-0x000000003fffefff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000003ffff000-0x000000003fffffff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000fdfffffff] usable [ 0.000000] BIOS-e820: [mem 0x0000004fe0000000-0x00000078bfffffff] usable [ 0.000000] bootconsole [earlyser0] enabled [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] SMBIOS 2.3 present. [ 0.000000] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 0.000000] Hypervisor detected: Microsoft HyperV [ 0.000000] HyperV: features 0x2e7f, hints 0x60e24 [ 0.000000] Hyper-V Host Build:18362-10.0-1-0.1370 [ 0.000000] HyperV: LAPIC Timer Frequency: 0x30d40 [ 0.000000] tsc: Marking TSC unstable due to running on Hyper-V [ 0.000000] Hyper-V: Using ext hypercall for remote TLB flush [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x78c0000 max_arch_pfn = 0x400000000 [ 0.000000] MTRR default type: uncachable [ 0.000000] MTRR fixed ranges enabled: [ 0.000000] 00000-9FFFF write-back [ 0.000000] A0000-DFFFF uncachable [ 0.000000] E0000-FFFFF write-back [ 0.000000] MTRR variable ranges enabled: [ 0.000000] 0 base 000000000000 mask 3FFFC0000000 write-back [ 0.000000] 1 base 000100000000 mask 3FF000000000 write-back [ 0.000000] 2 base 004FE0000000 mask 380000000000 write-back [ 0.000000] 3 base 080000000000 mask 000000000000 write-back [ 0.000000] 4 disabled [ 0.000000] 5 disabled [ 0.000000] 6 disabled [ 0.000000] 7 disabled [ 0.000000] PAT configuration [0-7]: WB WC UC- UC WB WP UC- UC [ 0.000000] e820: update [mem 0x40000000-0xffffffff] usable ==> reserved [ 0.000000] e820: update [mem 0x1100000000-0x4fdfffffff] usable ==> reserved [ 0.000000] e820: last_pfn = 0x3fff0 max_arch_pfn = 0x400000000 [ 0.000000] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [ffffffffff200780] [ 0.000000] Base memory trampoline at [ffff97eec0099000] 99000 size 24576 [ 0.000000] Using GB pages for direct mapping [ 0.000000] BRK [0x6598875000, 0x6598875fff] PGTABLE [ 0.000000] BRK [0x6598876000, 0x6598876fff] PGTABLE [ 0.000000] BRK [0x6598877000, 0x6598877fff] PGTABLE [ 0.000000] BRK [0x6598878000, 0x6598878fff] PGTABLE [ 0.000000] BRK [0x6598879000, 0x6598879fff] PGTABLE [ 0.000000] BRK [0x659887a000, 0x659887afff] PGTABLE [ 0.000000] BRK [0x659887b000, 0x659887bfff] PGTABLE [ 0.000000] RAMDISK: [mem 0x35aaa000-0x36d4cfff] [ 0.000000] Early table checksum verification disabled [ 0.000000] ACPI: RSDP 00000000000f5c00 00014 (v00 ACPIAM) [ 0.000000] ACPI: RSDT 000000003fff0000 00040 (v01 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: FACP 000000003fff0200 00081 (v02 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: DSDT 000000003fff1d24 03CD5 (v01 MSFTVM MSFTVM02 00000002 INTL 02002026) [ 0.000000] ACPI: FACS 000000003ffff000 00040 [ 0.000000] ACPI: WAET 000000003fff1a80 00028 (v01 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: SLIC 000000003fff1ac0 00176 (v01 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: OEM0 000000003fff1cc0 00064 (v01 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: SRAT 000000003fff0800 001E0 (v02 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000000] ACPI: APIC 000000003fff0300 000B2 (v01 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: OEMB 000000003ffff040 00064 (v01 VRTUAL MICROSFT 12001807 MSFT 00000097) [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] SRAT: PXM 0 -> APIC 0x00 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x01 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x02 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x03 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x04 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x05 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x06 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x07 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x08 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x09 -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x0a -> Node 0 [ 0.000000] SRAT: PXM 0 -> APIC 0x0b -> Node 0 [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] hotplug [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0xfdfffffff] hotplug [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x4fe0000000-0x78bfffffff] hotplug [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x78c0200000-0xffffffffff] hotplug [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x10000200000-0x1ffffffffff] hotplug [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x20000200000-0x3ffffffffff] hotplug [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x3fffffff] + [mem 0x100000000-0xfdfffffff] -> [mem 0x00000000-0xfdfffffff] [ 0.000000] NUMA: Node 0 [mem 0x00000000-0xfdfffffff] + [mem 0x4fe0000000-0x78bfffffff] -> [mem 0x00000000-0x78bfffffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x78bffd8000-0x78bfffefff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal [mem 0x100000000-0x78bfffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x0009efff] [ 0.000000] node 0: [mem 0x00100000-0x3ffeffff] [ 0.000000] node 0: [mem 0x100000000-0xfdfffffff] [ 0.000000] node 0: [mem 0x4fe0000000-0x78bfffffff] [ 0.000000] Initmem setup node 0 [mem 0x00001000-0x78bfffffff] [ 0.000000] On node 0 totalpages: 58720142 [ 0.000000] DMA zone: 64 pages used for memmap [ 0.000000] DMA zone: 21 pages reserved [ 0.000000] DMA zone: 3998 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 4032 pages used for memmap [ 0.000000] DMA32 zone: 258032 pages, LIFO batch:31 [ 0.000000] Normal zone: 913408 pages used for memmap [ 0.000000] Normal zone: 58458112 pages, LIFO batch:31 [ 0.000000] ACPI: PM-Timer IO Port: 0x408 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] enabled) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) [ 0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0]) [ 0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ9 used by override. [ 0.000000] Using ACPI (MADT) for SMP configuration information [ 0.000000] smpboot: Allowing 12 CPUs, 0 hotplug CPUs [ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff] [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff] [ 0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff] [ 0.000000] PM: Registered nosave memory: [mem 0x3fff0000-0x3fffefff] [ 0.000000] PM: Registered nosave memory: [mem 0x3ffff000-0x3fffffff] [ 0.000000] PM: Registered nosave memory: [mem 0x40000000-0xffffffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfe0000000-0x4fdfffffff] [ 0.000000] e820: [mem 0x40000000-0xffffffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on bare hardware [ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:1 [ 0.000000] percpu: Embedded 38 pages/cpu s118784 r8192 d28672 u262144 [ 0.000000] pcpu-alloc: s118784 r8192 d28672 u262144 alloc=12097152 [ 0.000000] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 -- -- -- -- [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 57802617 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.2.2.el7.x86_64 root=UUID=9d77033b-4696-4873-b883-a3124b94db64 ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 net.ifnames=0 LANG=en_US.UTF-8 [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form [ 0.000000] Memory: 5026616k/506462208k available (7788k kernel code, 271581640k absent, 3776552k reserved, 5954k data, 1984k init) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1 [ 0.000000] x86/pti: Unmapping kernel while in userspace [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=12. [ 0.000000] NR_IRQS:327936 nr_irqs:520 0 [ 0.000000] Console: colour VGA+ 80x25 [ 0.000000] console [tty1] enabled [ 0.000000] console [ttyS0] enabled, bootconsole disabled [ 0.000000] allocated 6106906624 bytes of page_cgroup [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups [ 0.000000] tsc: Detected 2593.992 MHz processor [ 0.000016] Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.98 BogoMIPS (lpj=2593992) [ 0.001039] pid_max: default: 32768 minimum: 301 [ 0.002000] Security Framework initialized [ 0.002000] SELinux: Initializing. [ 0.002039] SELinux: Starting in permissive mode [ 0.002041] Yama: becoming mindful. [ 0.003787] Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes) [ 0.006000] Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes) [ 0.008000] Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.008000] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.009000] Initializing cgroup subsys memory [ 0.009000] Initializing cgroup subsys devices [ 0.009000] Initializing cgroup subsys freezer [ 0.009000] Initializing cgroup subsys net_cls [ 0.009025] Initializing cgroup subsys blkio [ 0.010000] Initializing cgroup subsys perf_event [ 0.010000] Initializing cgroup subsys hugetlb [ 0.010000] Initializing cgroup subsys pids [ 0.010000] Initializing cgroup subsys net_prio [ 0.010880] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.011000] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0 [ 0.011000] tlb_flushall_shift: 6 [ 0.011000] FEATURE SPEC_CTRL Not Present [ 0.011024] FEATURE IBPB_SUPPORT Not Present [ 0.012000] Spectre V1 : Mitigation: Load fences, usercopy/swapgs barriers and __user pointer sanitization [ 0.012000] Spectre V2 : Vulnerable: Retpoline without IBPB [ 0.012000] Speculative Store Bypass: Vulnerable [ 0.012055] TAA: Mitigation: Clear CPU buffers [ 0.013000] MDS: Mitigation: Clear CPU buffers [ 0.013000] Freeing SMP alternatives: 28k freed [ 0.014000] ACPI: Core revision 20130517 [ 0.014000] ACPI: All ACPI Tables successfully acquired [ 0.014000] ftrace: allocating 29651 entries in 116 pages [ 0.015107] Switched APIC routing to physical flat. [ 0.018838] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.019002] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (fam: 06, model: 4f, stepping: 01) [ 0.023199] Performance Events: unsupported p6 CPU model 79 no PMU driver, software events only. [ 0.027555] NMI watchdog: disabled (cpu0): hardware events not enabled [ 0.028002] NMI watchdog: Shutting down hard lockup detector on all cpus [ 0.029061] smpboot: Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 OK [ 0.054007] Brought up 12 CPUs [ 0.055002] smpboot: Max logical packages: 1 [ 0.056003] smpboot: Total of 12 processors activated (62255.80 BogoMIPS) [ 0.885745] node 0 initialised, 55667382 pages in 826ms [ 0.889119] devtmpfs: initialized [ 0.890066] x86/mm: Memory block size: 1024MB [ 0.894808] EVM: security.selinux [ 0.895002] EVM: security.ima [ 0.896002] EVM: security.capability [ 0.898023] PM: Registering ACPI NVS region [mem 0x3ffff000-0x3fffffff] (4096 bytes) [ 0.900575] atomic64 test passed for x86-64 platform with CX8 and with SSE [ 0.901005] pinctrl core: initialized pinctrl subsystem [ 0.922519] RTC time: 12:20:29, date: 11/13/20 [ 0.923078] NET: Registered protocol family 16 [ 0.924179] ACPI: bus type PCI registered [ 0.925004] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 0.926357] PCI: Using configuration type 1 for base access [ 0.931107] ACPI: Added _OSI(Module Device) [ 0.932006] ACPI: Added _OSI(Processor Device) [ 0.933003] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.934002] ACPI: Added _OSI(Processor Aggregator Device) [ 0.935002] ACPI: Added _OSI(Linux-Dell-Video) [ 0.937096] ACPI: EC: Look up EC in DSDT [ 0.937515] ACPI: Executed 1 blocks of module-level executable AML code [ 0.941455] ACPI: Interpreter enabled [ 0.942009] ACPI: (supports S0 S5) [ 0.943002] ACPI: Using IOAPIC for interrupt routing [ 0.944022] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.945120] ACPI: Enabled 1 GPEs in block 00 to 0F [ 0.963288] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) [ 0.964007] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI] [ 0.965008] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM [ 0.966011] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge. [ 0.967164] PCI host bridge to bus 0000:00 [ 0.968004] pci_bus 0000:00: root bus resource [mem 0xfe0000000-0x4fdfffffff window] [ 0.969003] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 0.970005] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 0.971003] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 0.972003] pci_bus 0000:00: root bus resource [mem 0x40000000-0xfffbffff window] [ 0.973004] pci_bus 0000:00: root bus resource [bus 00-ff] [ 0.974183] pci 0000:00:00.0: [8086:7192] type 00 class 0x060000 [ 0.976559] pci 0000:00:07.0: [8086:7110] type 00 class 0x060100 [ 0.979385] pci 0000:00:07.1: [8086:7111] type 00 class 0x010180 [ 0.981321] pci 0000:00:07.1: reg 0x20: [io 0xffa0-0xffaf] [ 0.982204] pci 0000:00:07.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] [ 0.983003] pci 0000:00:07.1: legacy IDE quirk: reg 0x14: [io 0x03f6] [ 0.984003] pci 0000:00:07.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] [ 0.985004] pci 0000:00:07.1: legacy IDE quirk: reg 0x1c: [io 0x0376] [ 0.987253] pci 0000:00:07.3: [8086:7113] type 00 class 0x068000 [ 0.987322] Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
[X] Driver information from
nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Fri Nov 13 13:12:33 2020 Driver Version : 440.33.01 CUDA Version : 10.2
Attached GPUs : 2 GPU 00000001:00:00.0 Product Name : Tesla P100-PCIE-16GB Product Brand : Tesla Display Mode : Enabled Display Active : Disabled Persistence Mode : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 0322717060782 GPU UUID : GPU-da7c81e6-70c0-8287-9005-107c31b7be32 Minor Number : 0 VBIOS Version : 86.00.41.00.06 MultiGPU Board : No Board ID : 0x10000 GPU Part Number : 900-2H400-0000-000 Inforom Version Image Version : H400.0201.00.08 OEM Object : 1.1 ECC Object : 4.1 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : Pass-Through Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x00 Device : 0x00 Domain : 0x0001 Device Id : 0x15F810DE Bus Id : 00000001:00:00.0 Sub System Id : 0x118F10DE GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 16280 MiB Used : 0 MiB Free : 16280 MiB BAR1 Memory Usage Total : 16384 MiB Used : 2 MiB Free : 16382 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Aggregate Single Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending Page Blacklist : No Temperature GPU Current Temp : 26 C GPU Shutdown Temp : 85 C GPU Slowdown Temp : 82 C GPU Max Operating Temp : N/A Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 32.56 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 125.00 W Max Power Limit : 250.00 W Clocks Graphics : 1189 MHz SM : 1189 MHz Memory : 715 MHz Video : 1063 MHz Applications Clocks Graphics : 1189 MHz Memory : 715 MHz Default Applications Clocks Graphics : 1189 MHz Memory : 715 MHz Max Clocks Graphics : 1328 MHz SM : 1328 MHz Memory : 715 MHz Video : 1328 MHz Max Customer Boost Clocks Graphics : 1328 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
GPU 00000002:00:00.0 Product Name : Tesla P100-PCIE-16GB Product Brand : Tesla Display Mode : Enabled Display Active : Disabled Persistence Mode : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 0322617144069 GPU UUID : GPU-585e60f4-6ab3-5cab-09fa-dca4608155a2 Minor Number : 1 VBIOS Version : 86.00.41.00.06 MultiGPU Board : No Board ID : 0x20000 GPU Part Number : 900-2H400-0000-000 Inforom Version Image Version : H400.0201.00.08 OEM Object : 1.1 ECC Object : 4.1 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : Pass-Through Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x00 Device : 0x00 Domain : 0x0002 Device Id : 0x15F810DE Bus Id : 00000002:00:00.0 Sub System Id : 0x118F10DE GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 16280 MiB Used : 0 MiB Free : 16280 MiB BAR1 Memory Usage Total : 16384 MiB Used : 2 MiB Free : 16382 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Aggregate Single Bit Device Memory : 5 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 5 Double Bit Device Memory : 0 Register File : 0 L1 Cache : N/A L2 Cache : 0 Texture Memory : 0 Texture Shared : 0 CBU : N/A Total : 0 Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending Page Blacklist : No Temperature GPU Current Temp : 28 C GPU Shutdown Temp : 85 C GPU Slowdown Temp : 82 C GPU Max Operating Temp : N/A Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 31.58 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 125.00 W Max Power Limit : 250.00 W Clocks Graphics : 1189 MHz SM : 1189 MHz Memory : 715 MHz Video : 1075 MHz Applications Clocks Graphics : 1189 MHz Memory : 715 MHz Default Applications Clocks Graphics : 1189 MHz Memory : 715 MHz Max Clocks Graphics : 1328 MHz SM : 1328 MHz Memory : 715 MHz Video : 1328 MHz Max Customer Boost Clocks Graphics : 1328 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
[X] Docker version from
docker version
Client: Docker Engine - Community Version: 19.03.13 API version: 1.40 Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:03:45 2020 OS/Arch: linux/amd64 Experimental: false
Server: Docker Engine - Community Engine: Version: 19.03.13 API version: 1.40 (minimum version 1.12) Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:02:21 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.7 GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
[X] NVIDIA container library version from
nvidia-container-cli -V
version: 1.3.0 build date: 2020-09-16T12:35+0000 build revision: 16315ebdf4b9728e899f615e208b50c41d7a5d15 build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-39) build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections