Azure / batch-shipyard

Simplify HPC and Batch workloads on Azure
MIT License
277 stars 121 forks source link

All nodes in pool in state `starttaskfailed`: "Docker root dir $rootdir not within $USER_MOUNTPOINT" #291

Closed elemakil closed 5 years ago

elemakil commented 5 years ago

Problem Description

I have been using Azure Batch Shipyard with VMs of type STANDARD_NC6 succesfully for a while. Usually, I create a pool, submit some jobs (with several tasks) and kill the pool again, all over the course of at most a couple of days.

As of today, when creating the pool and submitting a job, all nodes enter the "starttaskfailed" state. I have deleted and recreated the pool and job several times. Using the Azure Batch Explorer I have checked the node startup logs and find the following text at the bottom of stdout.txt:

Client: Docker Engine - Community
 Version:           19.03.0
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.5
 Git commit:        aeac949
 Built:             Wed Jul 17 18:16:07 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false
2019-07-23T11:11:50,787210380+00:00 - ERROR - Docker root dir Dir: not within /mnt

This seems to originate from shipyard_nodeprep.sh line 730-737:

    local rootdir
    rootdir=$(docker info | grep "Docker Root Dir" | cut -d' ' -f 4)
    if echo "$rootdir" | grep "$USER_MOUNTPOINT" > /dev/null; then
        log DEBUG "Docker root dir: $rootdir"
    else
        log ERROR "Docker root dir $rootdir not within $USER_MOUNTPOINT"
        exit 1
    fi

It looks like the cut command does not properly extract the "Docker Root Dir" from the output of docker info (note that $rootdir = "Dir:" !).

Batch Shipyard Version

3.7.0

Steps to Reproduce

Create pool, then create job.

Expected Results

The pool gets created and the job + tasks start properly.

Actual Results

All nodes in the pool get stuck in "starttaskfailed"

Redacted Configuration

pool:

pool_specification:
  id: my-pool
  vm_configuration:
    platform_image:
      offer: UbuntuServer
      publisher: Canonical
      sku: 16.04-LTS
  vm_count:
    dedicated: 0
    low_priority: 10
  vm_size: STANDARD_NC6

Additional Logs

Header part from stdout.txt:

Configuration:
--------------
Custom image: 0
Native mode: 0
OS Distribution: ubuntu 16.04
Batch Shipyard version: 3.7.0
Blobxfer version: 1.7.0
Singularity version: 
User mountpoint: /mnt
Mount path: /mnt/batch/tasks/mounts
Batch Insights: 0
Prometheus: NE=, CA=,
Network optimization: 1
Encryption cert thumbprint: 
Install Kata Containers: 0
Default container runtime: runc
Install BeeGFS BeeOND: 0
Storage cluster mount: 
Custom mount: 
Install LIS: 
GPU: False:nvidia-driver_cc37.run
Azure Blob: 1
Azure File: 0
GlusterFS on compute: 0
HPN-SSH: 0
Enable Azure Batch group for Docker access: 
Fallback registry: 
Docker image preload delay: 0
Cascade via container: 1
P2P: 0
Block on images: REDACTED#

Additonal Comments

elemakil commented 5 years ago

I have connected into one of the nodes using ssh. This allows me to manually execute docker info:

$ sudo docker info | grep "Docker Root Dir"
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
 Docker Root Dir: /mnt/docker

I believe the leading space in front of "Docker Root Dir" is the source of the problems. However, I have no clue as to its origin.

elemakil commented 5 years ago

I have created a minimal example that reproduces the problem:

Configuration Files

pool.yaml

pool_specification:
  id: pool-gpu
  vm_configuration:
    platform_image:
      offer: UbuntuServer
      publisher: Canonical
      sku: 16.04-LTS
  vm_count:
    dedicated: 0
    low_priority: 1
  vm_size: STANDARD_NC6

jobs.yaml

job_specifications:
- id: myjob
  gpu: true
  tasks:
  - command: wc -l /etc/group
    docker_image: busybox

credentials.yaml

credentials:
  batch:
    account_key: REDACTED
    account_service_url: https://REDACTED.westeurope.batch.azure.com
  storage:
    my-storage:
      account: REDACTED
      account_key: REDACTED
      endpoint: core.windows.net

config.yaml

batch_shipyard:
  storage_account_settings: my-storage
global_resources:
  docker_images:
  - busybox

Stderr.txt

Synchronizing state of docker.service with SysV init with /lib/systemd/systemd-sysv-install...
Executing /lib/systemd/systemd-sysv-install disable docker
insserv: warning: current start runlevel(s) (empty) of script `docker' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `docker' overrides LSB defaults (0 1 6).
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
rmmod: ERROR: Module nouveau is not currently loaded

WARNING: nvidia-installer was forced to guess the X library path '/usr/lib'
         and X module path '/usr/lib/xorg/modules'; these paths were not
         queryable from the system.  If X fails to find the NVIDIA X driver
         module, please install the `pkg-config` utility and the X.Org
         SDK/development package for your distribution and reinstall the
         driver.

WARNING: Unable to find a suitable destination to install 32-bit
         compatibility libraries. Your system may not be set up for 32-bit
         compatibility. 32-bit compatibility files will not be installed;
         if you wish to install them, re-run the installation and set a
         valid directory with the --compat32-libdir option.

WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support

Stdout.txt

Linux 924efd148cb24c408e8cafe0f59fd6b2000000 4.15.0-1050-azure #55-Ubuntu SMP Sat Jun 29 00:27:54 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
2019-07-23T15:03:43,351787679+00:00 - INFO - Prep start
Configuration:
--------------
Custom image: 0
Native mode: 0
OS Distribution: ubuntu 16.04
Batch Shipyard version: 3.7.0
Blobxfer version: 1.7.0
Singularity version:
User mountpoint: /mnt
Mount path: /mnt/batch/tasks/mounts
Batch Insights: 0
Prometheus: NE=, CA=,
Network optimization: 1
Encryption cert thumbprint:
Install Kata Containers: 0
Default container runtime: runc
Install BeeGFS BeeOND: 0
Storage cluster mount:
Custom mount:
Install LIS:
GPU: False:nvidia-driver_cc37.run
Azure Blob: 0
Azure File: 0
GlusterFS on compute: 0
HPN-SSH: 0
Enable Azure Batch group for Docker access:
Fallback registry:
Docker image preload delay: 0
Cascade via container: 1
P2P: 0
Block on images: busybox#

2019-07-23T15:03:43,411091637+00:00 - INFO - LIS installation not required
 * Setting kernel variables ...
   ...done.
2019-07-23T15:03:43,434066357+00:00 - DEBUG - Installing Docker Host Engine
Hit:1 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Get:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:3 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:4 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:5 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [993 kB]
Get:6 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main Translation-en [392 kB]
Get:7 http://azure.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [756 kB]
Get:8 http://azure.archive.ubuntu.com/ubuntu xenial-updates/universe Translation-en [315 kB]
Get:9 http://azure.archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [16.7 kB]
Get:10 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [704 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/main Translation-en [280 kB]
Get:12 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [449 kB]
Get:13 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [182 kB]
Get:14 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [5,600 B]
Fetched 4,419 kB in 1s (4,218 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
apt-transport-https is already the newest version (1.2.32).
ca-certificates is already the newest version (20170717~16.04.2).
curl is already the newest version (7.47.0-1ubuntu2.13).
software-properties-common is already the newest version (0.96.20.8).
The following additional packages will be installed:
  gnupg-agent libassuan0 libksba8 libnpth0 pinentry-curses
Suggested packages:
  gnupg-doc parcimonie xloadimage pinentry-doc
Recommended packages:
  dirmngr
The following NEW packages will be installed:
  gnupg-agent gnupg2 libassuan0 libksba8 libnpth0 pinentry-curses
0 upgraded, 6 newly installed, 0 to remove and 13 not upgraded.
Need to get 1,159 kB of archives.
After this operation, 3,467 kB of additional disk space will be used.
Get:1 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 libassuan0 amd64 2.4.2-2 [34.6 kB]
Get:2 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 pinentry-curses amd64 0.9.7-3 [31.2 kB]
Get:3 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 libnpth0 amd64 1.2-3 [7,998 B]
Get:4 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 gnupg-agent amd64 2.1.11-6ubuntu2.1 [240 kB]
Get:5 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libksba8 amd64 1.3.3-1ubuntu0.16.04.1 [90.2 kB]
Get:6 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 gnupg2 amd64 2.1.11-6ubuntu2.1 [755 kB]
Fetched 1,159 kB in 0s (26.4 MB/s)
Selecting previously unselected package libassuan0:amd64.
(Reading database ... 53454 files and directories currently installed.)
Preparing to unpack .../libassuan0_2.4.2-2_amd64.deb ...
Unpacking libassuan0:amd64 (2.4.2-2) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Setting up libassuan0:amd64 (2.4.2-2) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Selecting previously unselected package pinentry-curses.
(Reading database ... 53459 files and directories currently installed.)
Preparing to unpack .../pinentry-curses_0.9.7-3_amd64.deb ...
Unpacking pinentry-curses (0.9.7-3) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up pinentry-curses (0.9.7-3) ...
Selecting previously unselected package libnpth0:amd64.
(Reading database ... 53467 files and directories currently installed.)
Preparing to unpack .../libnpth0_1.2-3_amd64.deb ...
Unpacking libnpth0:amd64 (1.2-3) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Setting up libnpth0:amd64 (1.2-3) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Selecting previously unselected package gnupg-agent.
(Reading database ... 53472 files and directories currently installed.)
Preparing to unpack .../gnupg-agent_2.1.11-6ubuntu2.1_amd64.deb ...
Unpacking gnupg-agent (2.1.11-6ubuntu2.1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up gnupg-agent (2.1.11-6ubuntu2.1) ...
Selecting previously unselected package libksba8:amd64.
(Reading database ... 53493 files and directories currently installed.)
Preparing to unpack .../libksba8_1.3.3-1ubuntu0.16.04.1_amd64.deb ...
Unpacking libksba8:amd64 (1.3.3-1ubuntu0.16.04.1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Setting up libksba8:amd64 (1.3.3-1ubuntu0.16.04.1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Selecting previously unselected package gnupg2.
(Reading database ... 53501 files and directories currently installed.)
Preparing to unpack .../gnupg2_2.1.11-6ubuntu2.1_amd64.deb ...
Unpacking gnupg2 (2.1.11-6ubuntu2.1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for install-info (6.1.0.dfsg.1-5) ...
Setting up gnupg2 (2.1.11-6ubuntu2.1) ...
OK
Hit:1 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Hit:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease
Hit:3 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease
Hit:4 http://security.ubuntu.com/ubuntu xenial-security InRelease
Get:5 https://download.docker.com/linux/ubuntu xenial InRelease [66.2 kB]
Get:6 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages [9,730 B]
Fetched 76.0 kB in 0s (227 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  containerd.io docker-ce-cli
Recommended packages:
  aufs-tools cgroupfs-mount | cgroup-lite pigz libltdl7
The following NEW packages will be installed:
  containerd.io docker-ce docker-ce-cli
0 upgraded, 3 newly installed, 0 to remove and 13 not upgraded.
Need to get 82.3 MB of archives.
After this operation, 366 MB of additional disk space will be used.
Get:1 https://download.docker.com/linux/ubuntu xenial/stable amd64 containerd.io amd64 1.2.6-3 [22.6 MB]
Get:2 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce-cli amd64 5:19.03.0~3-0~ubuntu-xenial [42.3 MB]
Get:3 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce amd64 5:18.09.2~3-0~ubuntu-xenial [17.4 MB]
Fetched 82.3 MB in 1s (52.0 MB/s)
Selecting previously unselected package containerd.io.
(Reading database ... 53564 files and directories currently installed.)
Preparing to unpack .../containerd.io_1.2.6-3_amd64.deb ...
Unpacking containerd.io (1.2.6-3) ...
Selecting previously unselected package docker-ce-cli.
Preparing to unpack .../docker-ce-cli_5%3a19.03.0~3-0~ubuntu-xenial_amd64.deb ...
Unpacking docker-ce-cli (5:19.03.0~3-0~ubuntu-xenial) ...
Selecting previously unselected package docker-ce.
Preparing to unpack .../docker-ce_5%3a18.09.2~3-0~ubuntu-xenial_amd64.deb ...
Unpacking docker-ce (5:18.09.2~3-0~ubuntu-xenial) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for ureadahead (0.100.0-19.1) ...
Processing triggers for systemd (229-4ubuntu21.22) ...
Setting up containerd.io (1.2.6-3) ...
Setting up docker-ce-cli (5:19.03.0~3-0~ubuntu-xenial) ...
Setting up docker-ce (5:18.09.2~3-0~ubuntu-xenial) ...
sent invalidate(passwd) request, exiting
sent invalidate(group) request, exiting
sent invalidate(group) request, exiting
update-alternatives: using /usr/bin/dockerd-ce to provide /usr/bin/dockerd (dockerd) in auto mode
Processing triggers for ureadahead (0.100.0-19.1) ...
Processing triggers for systemd (229-4ubuntu21.22) ...
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-07-23 15:04:22 UTC; 3ms ago
     Docs: https://docs.docker.com
 Main PID: 7034 (dockerd)
    Tasks: 15
   Memory: 39.7M
      CPU: 154ms
   CGroup: /system.slice/docker.service
           └─7034 /usr/bin/dockerd

Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.483659695Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.484005195Z" level=info msg="Loading containers: start."
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.525707201Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.547790657Z" level=info msg="Loading containers: done."
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.590130464Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.590332464Z" level=info msg="Docker daemon" commit=6247962 graphdriver(s)=overlay2 version=18.09.2
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.590413664Z" level=info msg="Daemon has completed initialization"
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.600459290Z" level=info msg="API listen on 127.0.0.1:2375"
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 systemd[1]: Started Docker Application Container Engine.
Jul 23 15:04:22 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[7034]: time="2019-07-23T15:04:22.600487190Z" level=info msg="API listen on /var/run/docker.sock"
Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 18.09.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-1050-azure
 Operating System: Ubuntu 16.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 55.02GiB
 Name: 924efd148cb24c408e8cafe0f59fd6b2000000
 ID: R5HU:57EJ:TT2N:XQIS:KS3U:DEII:WCFL:L2CG:OR33:75QM:WCSZ:HKXI
 Docker Root Dir: /mnt/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

2019-07-23T15:04:22,682171896+00:00 - INFO - Docker Host Engine installed
2019-07-23T15:04:22+00:00 - WARNING - No Docker registry servers found.
2019-07-23T15:04:22+00:00 - WARNING - No Singularity registry servers found.
2019-07-23T15:04:22,686669408+00:00 - INFO - Installing Nvidia Software
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0472:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Reading package lists...
Building dependency tree...
Reading state information...
Package 'xserver-xorg-video-nouveau' is not installed, so not removed
Package 'xserver-xorg-video-nouveau-hwe-16.04' is not installed, so not removed
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  binutils cpp cpp-5 dpkg-dev g++ g++-5 gcc gcc-5 libasan2 libatomic1
  libc-dev-bin libc6-dev libcc1-0 libcilkrts5 libdpkg-perl libgcc-5-dev
  libgomp1 libisl15 libitm1 liblsan0 libmpc3 libmpx0 libquadmath0
  libstdc++-5-dev libtsan0 libubsan0 linux-libc-dev make
Suggested packages:
  binutils-doc cpp-doc gcc-5-locales debian-keyring g++-multilib
  g++-5-multilib gcc-5-doc libstdc++6-5-dbg gcc-multilib manpages-dev autoconf
  automake libtool flex bison gdb gcc-doc gcc-5-multilib libgcc1-dbg
  libgomp1-dbg libitm1-dbg libatomic1-dbg libasan2-dbg liblsan0-dbg
  libtsan0-dbg libubsan0-dbg libcilkrts5-dbg libmpx0-dbg libquadmath0-dbg
  glibc-doc libstdc++-5-doc make-doc
Recommended packages:
  fakeroot libalgorithm-merge-perl manpages-dev libfile-fcntllock-perl
The following NEW packages will be installed:
  binutils build-essential cpp cpp-5 dpkg-dev g++ g++-5 gcc gcc-5 libasan2
  libatomic1 libc-dev-bin libc6-dev libcc1-0 libcilkrts5 libdpkg-perl
  libgcc-5-dev libgomp1 libisl15 libitm1 liblsan0 libmpc3 libmpx0 libquadmath0
  libstdc++-5-dev libtsan0 libubsan0 linux-libc-dev make
0 upgraded, 29 newly installed, 0 to remove and 14 not upgraded.
Need to get 35.9 MB of archives.
After this operation, 139 MB of additional disk space will be used.
Get:1 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 libmpc3 amd64 1.0.3-1 [39.7 kB]
Get:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 binutils amd64 2.26.1-1ubuntu1~16.04.8 [2,312 kB]
Get:3 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libc-dev-bin amd64 2.23-0ubuntu11 [68.5 kB]
Get:4 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-libc-dev amd64 4.4.0-154.181 [852 kB]
Get:5 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libc6-dev amd64 2.23-0ubuntu11 [2,086 kB]
Get:6 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 libisl15 amd64 0.16.1-1 [524 kB]
Get:7 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 cpp-5 amd64 5.4.0-6ubuntu1~16.04.11 [7,660 kB]
Get:8 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 cpp amd64 4:5.3.1-1ubuntu1 [27.7 kB]
Get:9 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libcc1-0 amd64 5.4.0-6ubuntu1~16.04.11 [38.8 kB]
Get:10 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libgomp1 amd64 5.4.0-6ubuntu1~16.04.11 [55.0 kB]
Get:11 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libitm1 amd64 5.4.0-6ubuntu1~16.04.11 [27.4 kB]
Get:12 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libatomic1 amd64 5.4.0-6ubuntu1~16.04.11 [8,896 B]
Get:13 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libasan2 amd64 5.4.0-6ubuntu1~16.04.11 [264 kB]
Get:14 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 liblsan0 amd64 5.4.0-6ubuntu1~16.04.11 [105 kB]
Get:15 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libtsan0 amd64 5.4.0-6ubuntu1~16.04.11 [244 kB]
Get:16 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libubsan0 amd64 5.4.0-6ubuntu1~16.04.11 [95.4 kB]
Get:17 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libcilkrts5 amd64 5.4.0-6ubuntu1~16.04.11 [40.1 kB]
Get:18 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libmpx0 amd64 5.4.0-6ubuntu1~16.04.11 [9,748 B]
Get:19 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libquadmath0 amd64 5.4.0-6ubuntu1~16.04.11 [131 kB]
Get:20 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libgcc-5-dev amd64 5.4.0-6ubuntu1~16.04.11 [2,229 kB]
Get:21 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 gcc-5 amd64 5.4.0-6ubuntu1~16.04.11 [8,417 kB]
Get:22 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 gcc amd64 4:5.3.1-1ubuntu1 [5,244 B]
Get:23 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libstdc++-5-dev amd64 5.4.0-6ubuntu1~16.04.11 [1,426 kB]
Get:24 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 g++-5 amd64 5.4.0-6ubuntu1~16.04.11 [8,310 kB]
Get:25 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 g++ amd64 4:5.3.1-1ubuntu1 [1,504 B]
Get:26 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 make amd64 4.1-6 [151 kB]
Get:27 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libdpkg-perl all 1.18.4ubuntu1.5 [195 kB]
Get:28 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 dpkg-dev all 1.18.4ubuntu1.5 [584 kB]
Get:29 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 build-essential amd64 12.1ubuntu2 [4,758 B]
Fetched 35.9 MB in 0s (70.8 MB/s)
Selecting previously unselected package libmpc3:amd64.
(Reading database ... 53792 files and directories currently installed.)
Preparing to unpack .../libmpc3_1.0.3-1_amd64.deb ...
Unpacking libmpc3:amd64 (1.0.3-1) ...
Selecting previously unselected package binutils.
Preparing to unpack .../binutils_2.26.1-1ubuntu1~16.04.8_amd64.deb ...
Unpacking binutils (2.26.1-1ubuntu1~16.04.8) ...
Selecting previously unselected package libc-dev-bin.
Preparing to unpack .../libc-dev-bin_2.23-0ubuntu11_amd64.deb ...
Unpacking libc-dev-bin (2.23-0ubuntu11) ...
Selecting previously unselected package linux-libc-dev:amd64.
Preparing to unpack .../linux-libc-dev_4.4.0-154.181_amd64.deb ...
Unpacking linux-libc-dev:amd64 (4.4.0-154.181) ...
Selecting previously unselected package libc6-dev:amd64.
Preparing to unpack .../libc6-dev_2.23-0ubuntu11_amd64.deb ...
Unpacking libc6-dev:amd64 (2.23-0ubuntu11) ...
Selecting previously unselected package libisl15:amd64.
Preparing to unpack .../libisl15_0.16.1-1_amd64.deb ...
Unpacking libisl15:amd64 (0.16.1-1) ...
Selecting previously unselected package cpp-5.
Preparing to unpack .../cpp-5_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking cpp-5 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package cpp.
Preparing to unpack .../cpp_4%3a5.3.1-1ubuntu1_amd64.deb ...
Unpacking cpp (4:5.3.1-1ubuntu1) ...
Selecting previously unselected package libcc1-0:amd64.
Preparing to unpack .../libcc1-0_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libcc1-0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libgomp1:amd64.
Preparing to unpack .../libgomp1_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libgomp1:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libitm1:amd64.
Preparing to unpack .../libitm1_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libitm1:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libatomic1:amd64.
Preparing to unpack .../libatomic1_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libatomic1:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libasan2:amd64.
Preparing to unpack .../libasan2_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libasan2:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package liblsan0:amd64.
Preparing to unpack .../liblsan0_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking liblsan0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libtsan0:amd64.
Preparing to unpack .../libtsan0_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libtsan0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libubsan0:amd64.
Preparing to unpack .../libubsan0_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libubsan0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libcilkrts5:amd64.
Preparing to unpack .../libcilkrts5_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libcilkrts5:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libmpx0:amd64.
Preparing to unpack .../libmpx0_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libmpx0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libquadmath0:amd64.
Preparing to unpack .../libquadmath0_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libquadmath0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package libgcc-5-dev:amd64.
Preparing to unpack .../libgcc-5-dev_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libgcc-5-dev:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package gcc-5.
Preparing to unpack .../gcc-5_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking gcc-5 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package gcc.
Preparing to unpack .../gcc_4%3a5.3.1-1ubuntu1_amd64.deb ...
Unpacking gcc (4:5.3.1-1ubuntu1) ...
Selecting previously unselected package libstdc++-5-dev:amd64.
Preparing to unpack .../libstdc++-5-dev_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking libstdc++-5-dev:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package g++-5.
Preparing to unpack .../g++-5_5.4.0-6ubuntu1~16.04.11_amd64.deb ...
Unpacking g++-5 (5.4.0-6ubuntu1~16.04.11) ...
Selecting previously unselected package g++.
Preparing to unpack .../g++_4%3a5.3.1-1ubuntu1_amd64.deb ...
Unpacking g++ (4:5.3.1-1ubuntu1) ...
Selecting previously unselected package make.
Preparing to unpack .../archives/make_4.1-6_amd64.deb ...
Unpacking make (4.1-6) ...
Selecting previously unselected package libdpkg-perl.
Preparing to unpack .../libdpkg-perl_1.18.4ubuntu1.5_all.deb ...
Unpacking libdpkg-perl (1.18.4ubuntu1.5) ...
Selecting previously unselected package dpkg-dev.
Preparing to unpack .../dpkg-dev_1.18.4ubuntu1.5_all.deb ...
Unpacking dpkg-dev (1.18.4ubuntu1.5) ...
Selecting previously unselected package build-essential.
Preparing to unpack .../build-essential_12.1ubuntu2_amd64.deb ...
Unpacking build-essential (12.1ubuntu2) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up libmpc3:amd64 (1.0.3-1) ...
Setting up binutils (2.26.1-1ubuntu1~16.04.8) ...
Setting up libc-dev-bin (2.23-0ubuntu11) ...
Setting up linux-libc-dev:amd64 (4.4.0-154.181) ...
Setting up libc6-dev:amd64 (2.23-0ubuntu11) ...
Setting up libisl15:amd64 (0.16.1-1) ...
Setting up cpp-5 (5.4.0-6ubuntu1~16.04.11) ...
Setting up cpp (4:5.3.1-1ubuntu1) ...
Setting up libcc1-0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libgomp1:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libitm1:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libatomic1:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libasan2:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up liblsan0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libtsan0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libubsan0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libcilkrts5:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libmpx0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libquadmath0:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up libgcc-5-dev:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up gcc-5 (5.4.0-6ubuntu1~16.04.11) ...
Setting up gcc (4:5.3.1-1ubuntu1) ...
Setting up libstdc++-5-dev:amd64 (5.4.0-6ubuntu1~16.04.11) ...
Setting up g++-5 (5.4.0-6ubuntu1~16.04.11) ...
Setting up g++ (4:5.3.1-1ubuntu1) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up make (4.1-6) ...
Setting up libdpkg-perl (1.18.4ubuntu1.5) ...
Setting up dpkg-dev (1.18.4ubuntu1.5) ...
Setting up build-essential (12.1ubuntu2) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.104......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Persistence mode is already Enabled for GPU 00000472:00:00.0.
All done.
OK
deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/$(ARCH) /
Hit:1 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Hit:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease
Hit:3 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease
Get:4 https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  InRelease [1,139 B]
Get:5 https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  InRelease [1,136 B]
Get:6 https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  InRelease [1,129 B]
Hit:7 https://download.docker.com/linux/ubuntu xenial InRelease
Hit:8 http://security.ubuntu.com/ubuntu xenial-security InRelease
Get:9 https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  Packages [6,712 B]
Get:10 https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  Packages [8,428 B]
Get:11 https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages [8,172 B]
Fetched 26.7 kB in 0s (57.7 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libnvidia-container-tools libnvidia-container1 nvidia-container-runtime-hook
The following NEW packages will be installed:
  libnvidia-container-tools libnvidia-container1 nvidia-container-runtime
  nvidia-container-runtime-hook nvidia-docker2
0 upgraded, 5 newly installed, 0 to remove and 14 not upgraded.
Need to get 2,407 kB of archives.
After this operation, 9,771 kB of additional disk space will be used.
Get:1 https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  libnvidia-container1 1.0.2-1 [57.6 kB]
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  libnvidia-container-tools 1.0.2-1 [15.3 kB]
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  nvidia-container-runtime-hook 1.4.0-1 [575 kB]
Get:4 https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  nvidia-container-runtime 2.0.0+docker18.09.2-1 [1,756 kB]
Get:5 https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  nvidia-docker2 2.0.3+docker18.09.2-1 [2,884 B]
Fetched 2,407 kB in 0s (4,161 kB/s)
Selecting previously unselected package libnvidia-container1:amd64.
(Reading database ... 56761 files and directories currently installed.)
Preparing to unpack .../libnvidia-container1_1.0.2-1_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.0.2-1) ...
Selecting previously unselected package libnvidia-container-tools.
Preparing to unpack .../libnvidia-container-tools_1.0.2-1_amd64.deb ...
Unpacking libnvidia-container-tools (1.0.2-1) ...
Selecting previously unselected package nvidia-container-runtime-hook.
Preparing to unpack .../nvidia-container-runtime-hook_1.4.0-1_amd64.deb ...
Unpacking nvidia-container-runtime-hook (1.4.0-1) ...
Selecting previously unselected package nvidia-container-runtime.
Preparing to unpack .../nvidia-container-runtime_2.0.0+docker18.09.2-1_amd64.deb ...
Unpacking nvidia-container-runtime (2.0.0+docker18.09.2-1) ...
Selecting previously unselected package nvidia-docker2.
Preparing to unpack .../nvidia-docker2_2.0.3+docker18.09.2-1_all.deb ...
Unpacking nvidia-docker2 (2.0.3+docker18.09.2-1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Setting up libnvidia-container1:amd64 (1.0.2-1) ...
Setting up libnvidia-container-tools (1.0.2-1) ...
Setting up nvidia-container-runtime-hook (1.4.0-1) ...
Setting up nvidia-container-runtime (2.0.0+docker18.09.2-1) ...
Setting up nvidia-docker2 (2.0.3+docker18.09.2-1) ...

Configuration file '/etc/docker/daemon.json'
 ==> File on system created by you or by a script.
 ==> File also in package provided by package maintainer.
 ==> Using new file as you requested.
Installing new version of config file /etc/docker/daemon.json ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
2019-07-23T15:05:23,761873791+00:00 - DEBUG - data-root not detected in Docker daemon.json
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-07-23 15:05:24 UTC; 3ms ago
     Docs: https://docs.docker.com
 Main PID: 17513 (dockerd)
    Tasks: 15
   Memory: 34.1M
      CPU: 182ms
   CGroup: /system.slice/docker.service
           └─17513 /usr/bin/dockerd

Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.392641863Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.393019964Z" level=info msg="Loading containers: start."
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.455988209Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.480274966Z" level=info msg="Loading containers: done."
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.530066681Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.530270281Z" level=info msg="Docker daemon" commit=6247962 graphdriver(s)=overlay2 version=18.09.2
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.530324681Z" level=info msg="Daemon has completed initialization"
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.535510193Z" level=info msg="API listen on /var/run/docker.sock"
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 dockerd[17513]: time="2019-07-23T15:05:24.535532193Z" level=info msg="API listen on 127.0.0.1:2375"
Jul 23 15:05:24 924efd148cb24c408e8cafe0f59fd6b2000000 systemd[1]: Started Docker Application Container Engine.
NVIDIA Docker: 2.0.3
Client: Docker Engine - Community
 Version:           19.03.0
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.5
 Git commit:        aeac949
 Built:             Wed Jul 17 18:16:07 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false
2019-07-23T15:05:24,660372782+00:00 - ERROR - Docker root dir Dir: not within /mnt
alfpark commented 5 years ago

Excellent report, thanks for the thoroughness.

It may be the docker-ce-cli package which is not pinned (unfortunately).

I will try to repro and provide some temporary mitigation workarounds.

JadenLy commented 5 years ago

Same issue here. Yesterday morning it worked as usual, but just one or two hours later all starting tasks failed. Please solve it as soon as possible. Thanks!

alfpark commented 5 years ago

I have confirmed that this affects GPU pools (only).

There are two ways to mitigate this before a release is made to fix this issue (which may take some time):

  1. Use a native container pool. Your pool.yaml file should have the following vm_configuration section. Please ensure that your workload is compatible with native mode:
  vm_configuration:
    platform_image:
      publisher: Canonical
      offer: UbuntuServer
      sku: 16.04-LTS
      native: true
  1. If you have installed via git clone method and have the sources:

Replace:

https://github.com/Azure/batch-shipyard/blob/2301e20cfca33a3e6546285245d8acb94b8ce77c/scripts/shipyard_nodeprep.sh#L731

with:

rootdir=$(awk -F' ' '{print $NF}' <<< $(docker info | grep "Docker Root Dir"))

canoas commented 5 years ago

Thank you @alfpark ! workaround is working

rootdir=$(awk -F' ' '{print $NF}' <<< $(docker info | grep "Docker Root Dir"))

but we had to recreate our pool from a clean state using a recompiled version of shipyard

alfpark commented 5 years ago

I've prepared a fix for Ubuntu-based OSes, but there are numerous issues with CentOS-based systems and recent updates to nvidia-docker2 dependency packages which is breaking older pinned installs. It looks like nvidia is looking into these issues, but for those on CentOS, it's recommended to switch to Ubuntu for the time being to unblock.

Docker will be updated to the latest CE release in the next non-hotfix Batch Shipyard release and will use "native" GPU support in Docker.

elemakil commented 5 years ago

@alfpark Thanks a ton for providing such a quick workaround! Using native: true has indeed resolved this issue for me.