Closed ryanfb closed 2 years ago
I am also having this same issue. Diag 11EED9F3-181B-41BC-A99D-BEF7DDC1580E
Same issue. I have 4 CPUs/6GB RAM allocated.
Still seeing this non-stop. It's sufficient to run docker-compose up indexer
after cloning/updating https://github.com/dcthree/dclp-docker to make the crash happen (since there are some unversioned config files for some of the other services).
Sadly, I can confirm that since a few weeks docker for mac is far less stable than it used to be.
It randomly crashes and only a mac reboot helps to bring back the service. Manually restarting the service doesn't work.
Also having this issue. However I have noticed that starting a new Diagnose process tends to unblock the process. However I am using a custom container I am building. Restarting the app does help but you have to kill both the hyperkit process and the qcow-tool process. A53027AC-01D5-42AC-BBEB-2B7C58218846
Installed docker on new MBP and after mac woke up from sleep, I noticed this issue. I have not noticed this on my previous MBP.
Docker for Mac: version: 17.09.0-ce-mac35 (69202b202f497d4b6e627c3370781b9e4b51ec78)
macOS: version 10.12.6 (build: 16G1036)
logs: /tmp/D06DD1FC-D953-476E-9381-80A47EB055D7/20171207-115540.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK] db.git
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] virtualization VT-X
[OK] app
[OK] moby
[OK] system
[OK] moby-syslog
[OK] db
[OK] env
[OK] virtualization kern.hv_support
[OK] slirp
[OK] osxfs
[OK] moby-console
[OK] logs
[ERROR] docker-cli
docker ps failed
[OK] menubar
[OK] disk
Diagnostic ID: D06DD1FC-D953-476E-9381-80A47EB055D7
Please, try a more recent version of Docker for Mac.
@akimd I just tried again with the latest edge, 18.01.0-ce-mac48 (220004). The exact same thing still happens.
Docker for Mac: version: 18.01.0-ce-mac48 (d1778b704353fa5b79142a2055a2c11c8b48a653)
macOS: version 10.12.6 (build: 16G1114)
logs: /tmp/230E6503-3092-4DE1-BC76-47C03F92A4D5/20180119-121833.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK] db.git
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] virtualization VT-X
[OK] app
[OK] moby
[OK] system
[OK] moby-syslog
[OK] kubernetes
[OK] env
[OK] virtualization kern.hv_support
[OK] slirp
[OK] osxfs
[OK] moby-console
[OK] logs
[ERROR] docker-cli
docker ps failed
[OK] menubar
[OK] disk
Diagnostic ID: 230E6503-3092-4DE1-BC76-47C03F92A4D5
I had the same issue with docker 18.02-ce. After executing for a interval, some subcommands as 'docker-rmi' hanged forever
Having this same issue. Do we need to open another issue?
Hi guys,
No, there's no need for another issue, thanks! We need to understand what is going on here and fix it. Currently we're busy preparing the next releases, we will be back to this issue as soon as possible.
Thanks for your help!
One question though: are you running qcow or raw images? What does
$ ls -l ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.*
give?
Thanks for reopening this issue!
For me, that command shows a Docker.qcow2
file. Should I try switching to raw images, and would that happen automatically (if and only if) I upgrade to High Sierra + APFS for my ~/Library volume?
You don't need to update to raw. As a matter of fact I was asking because we found raw disks to be less reliable so far, and I was wondering if it could be related to your problems.
No, we don't migrate from qcow2 to raw, it's only when starting anew that raw might be chosen.
+1 ... I'm having the same issue. Docker for Mac is a better experience than virtual box was, what without the intermediate VM and all, but this makes it torture. docker rmi
has about a 50% chance of hanging, so when I run out of space it's an endless cycle of rmi
, HANG, stop docker, start docker, rmi
, rmi
, HANG...
Of course, when it's hanging diagnostics doesn't work. But this is what it says when everything's working:
Docker for Mac: version: 17.12.0-ce-mac55 (18467c0ae7afb7a736e304f991ccc1a61d67a4ab)
macOS: version 10.13.3 (build: 17D102)
logs: /tmp/A25F9E34-E0DC-43CA-A70F-CFD8467AF87C/20180228-081116.tar.gz
[OK] vpnkit
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] app
[OK] virtualization VT-X
[OK] moby
[OK] system
[OK] moby-syslog
[OK] kubernetes
[OK] env
[OK] virtualization kern.hv_support
[OK] moby-console
[OK] osxfs
[OK] logs
[OK] docker-cli
[OK] disk
Diagnostic ID: A25F9E34-E0DC-43CA-A70F-CFD8467AF87C
(incidentally, rmi
and rm
do seem to be the main triggers; otherwise it's pretty solid.
The ls
command above gives:
iainbryson@Iains-MacBook-Pro (devel) $ ls -l ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.*
-rw-r--r--@ 1 iainbryson staff 47927853056 Feb 28 08:15 /Users/iainbryson/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
And this is an APFS volume, if that matters.
Still seeing this in 18.03.0-ce-rc1-mac54 (23022).
Usually when this happens, trying to restart or quit-then-start Docker via the Docker for Mac GUI will hang in the "Docker is starting" state, and I have to force quit the com.docker.hyperkit
process then open Docker for Mac to get Docker into a usable state again.
Looking around for other solutions, I came across docker/compose#3633, which points (at the end) to moby/moby#35933. This may not be the same issue as what I'm experiencing since people there report that rolling back to 17.09 fixes the issue for them, while I was already experiencing this problem in 17.06. I don't believe that I'm seeing this due to tty, resource, or network issues either.
@ryanfb Thanks for the pointers, these issues are interesting.
I can't reproduce the behaviour. Could you submit a Diagnostic using the latest Edge (mac54 is good)?
Diagnostic ID: 230E6503-3092-4DE1-BC76-47C03F92A4D5
I've been trying to dig deeper into this over the last couple days - one thing I've been checking is what's going on inside Docker by using screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
and watching the kernel log messages when this behavior occurs. The first time I did this I was seeing NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [containerd:906]
. Googling took me to #1950 where I tried disabling trim as suggested, but I still kept getting hangs, of the form:
[ 1660.441275] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1660.441744] 2-...: (1 GPs behind) idle=467/140000000000000/0 softirq=35675/35675 fqs=12882
[ 1660.442492] (detected by 0, t=60142 jiffies, g=24859, c=24858, q=266)
[ 1660.443055] Task dump for CPU 2:
[ 1660.443329] scsi_eh_0 R running task 0 284 2 0x00000008
[ 1660.443976] 0000000000000000 ffffffff955754cd 0000000000000000 ffff9b719da95000
[ 1660.444684] ffffaf78c091be00 ffff9b719db68000 ffffaf78c091be78 ffff9b719dda44c0
[ 1660.445518] 0000000000000246 ffffffff9557588c ffff9b7193df1218 ffff9b719d02a658
[ 1660.446342] Call Trace:
[ 1660.446548] [<ffffffff955754cd>] ? ata_scsi_port_error_handler+0x228/0x544
[ 1660.447211] [<ffffffff9557588c>] ? ata_scsi_error+0xa3/0xdb
[ 1660.447678] [<ffffffff9554526d>] ? scsi_error_handler+0xaf/0x472
[ 1660.448198] [<ffffffff950fe5b3>] ? finish_task_switch+0x115/0x18b
[ 1660.448813] [<ffffffff957f53e3>] ? __schedule+0x36c/0x465
[ 1660.449383] [<ffffffff955451be>] ? scsi_eh_get_sense+0xdd/0xdd
[ 1660.449974] [<ffffffff950f7b56>] ? kthread+0xb4/0xbc
[ 1660.450446] [<ffffffff950f7aa2>] ? init_completion+0x1d/0x1d
[ 1660.450937] [<ffffffff957f8261>] ? ret_from_fork+0x41/0x50
Since then I've done a few factory resets - after the first of these, I could no longer disable trim per the instructions in this comment, since ~/Library/Containers/com.docker.docker/Data/database/
no longer existed. When Docker hangs I'm still seeing INFO: rcu_sched detected stalls on CPUs/tasks
with a similar backtrace, and/or e.g. NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [containerd:906]
, as in this screenshot:
On my machine docker --version
reliably takes about 30s.
$ time docker --version
Docker version 17.12.0-ce, build c97c6d6
docker --version 0.01s user 0.01s system 0% cpu 30.035 total
Hello,
I'm also having many issues with the latest Docker version. It takes about 2 min to start although only 1 small dnsmasq service is configured to start along with Docker.
When waking up the mac, docker-compose timeouts when stopping or restarting a stack. Only "fix" is to restart Docker which is another 3 min wait...
Almost by accident, I think I've discovered a workaround for my particular case. I decided to try running the same docker-compose
setup on my (relatively under-resourced) MacBook Pro instead of my iMac, to see what happened with the latest Docker edge (previously I had encountered the same behavior on both machines). Since the internal (SSD) drive on the MBP was running out of space, I decided to try moving the Docker disk image location (the Docker.qcow2
file) to an external USB drive, using the Docker for Mac UI. Miraculously, I was able to do everything I needed to do without Docker crashing or becoming completely unresponsive.
This gave me the idea to try the same thing on my iMac today - and after moving the Docker disk image off the internal 3TB Fusion Drive to an external USB drive, I seem to be able to do everything I need to do without having Docker crash or become completely unresponsive.
All drives (internal and external) are formatted Mac OS Extended Journaled (case-insensitive), and the internal drives report a S.M.A.R.T. status of "Verified" with no other programs appearing to have issues using them.
Perhaps this is consistent with the ATA/SCSI errors causing a CPU/task stall in my logs above, though I'm not sure what the root cause or error is. The thing consistent across both machines is that the problematic drive for Docker is an internal SSD or Fusion Drive.
@ryanfb thanks for the logs above. I extracted the logs from the diagnostics and, for now, just add them here for completeness as there actually were some relevant error messages before the hung task messages:
[ 446.797918] br-69957a06e2c6: port 4(veth8a39e2d) entered blocking state
[ 446.798583] br-69957a06e2c6: port 4(veth8a39e2d) entered forwarding state
[ 1051.734506] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[ 1051.735271] ata1.00: cmd 61/00:00:00:04:51/01:00:01:00:00/40 tag 0 ncq dma 131072 out
[ 1051.735271] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.736914] ata1.00: cmd 61/00:08:00:05:51/01:00:01:00:00/40 tag 1 ncq dma 131072 out
[ 1051.736914] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.738570] ata1.00: cmd 61/00:10:00:06:51/01:00:01:00:00/40 tag 2 ncq dma 131072 out
[ 1051.738570] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.740175] ata1.00: cmd 61/00:18:00:07:51/01:00:01:00:00/40 tag 3 ncq dma 131072 out
[ 1051.740175] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.741829] ata1.00: cmd 61/00:20:00:08:51/01:00:01:00:00/40 tag 4 ncq dma 131072 out
[ 1051.741829] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.743430] ata1.00: cmd 61/00:28:00:09:51/01:00:01:00:00/40 tag 5 ncq dma 131072 out
[ 1051.743430] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.745077] ata1.00: cmd 61/00:30:00:0a:51/01:00:01:00:00/40 tag 6 ncq dma 131072 out
[ 1051.745077] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.746778] ata1.00: cmd 61/00:38:00:0b:51/01:00:01:00:00/40 tag 7 ncq dma 131072 out
[ 1051.746778] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.748412] ata1.00: cmd 61/00:40:00:0c:51/01:00:01:00:00/40 tag 8 ncq dma 131072 out
[ 1051.748412] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.750128] ata1.00: cmd 61/00:48:00:0d:51/01:00:01:00:00/40 tag 9 ncq dma 131072 out
[ 1051.750128] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.751764] ata1.00: cmd 61/00:50:00:0e:51/01:00:01:00:00/40 tag 10 ncq dma 131072 out
[ 1051.751764] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.753277] ata1.00: cmd 61/00:58:00:0f:51/01:00:01:00:00/40 tag 11 ncq dma 131072 out
[ 1051.753277] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.755023] ata1.00: cmd 61/00:60:00:10:51/01:00:01:00:00/40 tag 12 ncq dma 131072 out
[ 1051.755023] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.756799] ata1.00: cmd 61/00:68:00:11:51/01:00:01:00:00/40 tag 13 ncq dma 131072 out
[ 1051.756799] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.758374] ata1.00: cmd 61/00:70:00:f3:50/01:00:01:00:00/40 tag 14 ncq dma 131072 out
[ 1051.758374] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.760058] ata1.00: cmd 61/00:78:00:f4:50/01:00:01:00:00/40 tag 15 ncq dma 131072 out
[ 1051.760058] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.761975] ata1.00: cmd 61/00:80:00:f5:50/01:00:01:00:00/40 tag 16 ncq dma 131072 out
[ 1051.761975] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.763722] ata1.00: cmd 61/00:88:00:f6:50/01:00:01:00:00/40 tag 17 ncq dma 131072 out
[ 1051.763722] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.765457] ata1.00: cmd 61/00:90:00:f7:50/01:00:01:00:00/40 tag 18 ncq dma 131072 out
[ 1051.765457] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.767301] ata1.00: cmd 61/00:98:00:f8:50/01:00:01:00:00/40 tag 19 ncq dma 131072 out
[ 1051.767301] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.769089] ata1.00: cmd 61/00:a0:00:f9:50/01:00:01:00:00/40 tag 20 ncq dma 131072 out
[ 1051.769089] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.770706] ata1.00: cmd 61/00:a8:00:fa:50/01:00:01:00:00/40 tag 21 ncq dma 131072 out
[ 1051.770706] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.772230] ata1.00: cmd 61/00:b0:00:fb:50/01:00:01:00:00/40 tag 22 ncq dma 131072 out
[ 1051.772230] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.773659] ata1.00: cmd 61/00:b8:00:fc:50/01:00:01:00:00/40 tag 23 ncq dma 131072 out
[ 1051.773659] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.775117] ata1.00: cmd 61/00:c0:00:fd:50/01:00:01:00:00/40 tag 24 ncq dma 131072 out
[ 1051.775117] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.776791] ata1.00: cmd 61/00:c8:00:fe:50/01:00:01:00:00/40 tag 25 ncq dma 131072 out
[ 1051.776791] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.778550] ata1.00: cmd 61/00:d0:00:ff:50/01:00:01:00:00/40 tag 26 ncq dma 131072 out
[ 1051.778550] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.780251] ata1.00: cmd 61/00:d8:00:00:51/01:00:01:00:00/40 tag 27 ncq dma 131072 out
[ 1051.780251] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.781892] ata1.00: cmd 61/00:e0:00:01:51/01:00:01:00:00/40 tag 28 ncq dma 131072 out
[ 1051.781892] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.783683] ata1.00: cmd 61/00:e8:00:02:51/01:00:01:00:00/40 tag 29 ncq dma 131072 out
[ 1051.783683] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.785207] ata1.00: cmd 61/00:f0:00:03:51/01:00:01:00:00/40 tag 30 ncq dma 131072 out
[ 1051.785207] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.786715] ata1: hard resetting link
[ 1119.329182] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1119.329707] 2-...: (1 GPs behind) idle=467/140000000000000/0 softirq=35675/35675 fqs=1572
[ 1119.330499] (detected by 1, t=6003 jiffies, g=24859, c=24858, q=45)
[ 1119.331166] Task dump for CPU 2:
[ 1119.331517] scsi_eh_0 R running task 0 284 2 0x00000008
[ 1119.332222] 0000000000000000 ffffffff955754cd 0000000000000000 ffff9b719da95000
[ 1119.333034] ffffaf78c091be00 ffff9b719db68000 ffffaf78c091be78 ffff9b719dda44c0
[ 1119.333808] 0000000000000246 ffffffff9557588c ffff9b7193df1218 ffff9b719d02a658
[ 1119.334685] Call Trace:
[ 1119.334915] [<ffffffff955754cd>] ? ata_scsi_port_error_handler+0x228/0x544
[ 1119.335638] [<ffffffff9557588c>] ? ata_scsi_error+0xa3/0xdb
[ 1119.336229] [<ffffffff9554526d>] ? scsi_error_handler+0xaf/0x472
[ 1119.336844] [<ffffffff950fe5b3>] ? finish_task_switch+0x115/0x18b
[ 1119.337453] [<ffffffff957f53e3>] ? __schedule+0x36c/0x465
[ 1119.338027] [<ffffffff955451be>] ? scsi_eh_get_sense+0xdd/0xdd
[ 1119.338591] [<ffffffff950f7b56>] ? kthread+0xb4/0xbc
[ 1119.339057] [<ffffffff950f7aa2>] ? init_completion+0x1d/0x1d
[ 1119.339734] [<ffffffff957f8261>] ? ret_from_fork+0x41/0x50
[ 1300.536129] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1300.536665] 2-...: (1 GPs behind) idle=467/140000000000000/0 softirq=35675/35675 fqs=8009
[ 1300.537446] (detected by 1, t=24133 jiffies, g=24859, c=24858, q=122)
[ 1300.538090] Task dump for CPU 2:
[ 1300.538404] scsi_eh_0 R running task 0 284 2 0x00000008
[ 1300.539118] 0000000000000000 ffffffff955754cd 0000000000000000 ffff9b719da95000
[ 1300.540048] ffffaf78c091be00 ffff9b719db68000 ffffaf78c091be78 ffff9b719dda44c0
[ 1300.540880] 0000000000000246 ffffffff9557588c ffff9b7193df1218 ffff9b719d02a658
[ 1300.541617] Call Trace:
[ 1300.541863] [<ffffffff955754cd>] ? ata_scsi_port_error_handler+0x228/0x544
[ 1300.542524] [<ffffffff9557588c>] ? ata_scsi_error+0xa3/0xdb
[ 1300.543058] [<ffffffff9554526d>] ? scsi_error_handler+0xaf/0x472
[ 1300.543643] [<ffffffff950fe5b3>] ? finish_task_switch+0x115/0x18b
[ 1300.544317] [<ffffffff957f53e3>] ? __schedule+0x36c/0x465
[ 1300.544813] [<ffffffff955451be>] ? scsi_eh_get_sense+0xdd/0xdd
[ 1300.545376] [<ffffffff950f7b56>] ? kthread+0xb4/0xbc
[ 1300.545888] [<ffffffff950f7aa2>] ? init_completion+0x1d/0x1d
[ 1300.546431] [<ffffffff957f8261>] ? ret_from_fork+0x41/0x50
So this indicates some hang/timeout in the hyperkit blockdevice layer (or further down in the qcow2 code). We had related issues in the past (see https://github.com/moby/hyperkit/issues/94).
@ryanfb you mentioned that your MBP is relatively under-resourced and may have been close to running out of diskspace but also said your iMac has 3TB drive and you had the same issue there. Is the drive in the iMac also close to full?
I've not been able to repro this locally with the two cases you mentioned above
The 3TB drive in the iMac has/had about 400GB+ free when I was encountering this problem. In the Docker for Mac UI the disk image was sized to ~200GB with ~20GB used on disk. The MBP is under-resourced in terms of memory/CPU - only 8GB (vs. 32GB in the iMac) and a slower/older CPU.
+1 Couldn't even get a diagnostic.
Can someone reproduce it with 18.03 (stable or edge) and post diagnostics please?
@akimd Diagnostic ID: 3AE6953B-DEE7-4441-A4C9-11E944ECB248
Docker for Mac: version: 18.03.0-ce-mac59 (dd2831d4b7421cf559a0881cc7a5fdebeb8c2b98)
macOS: version 10.12.6 (build: 16G1212)
logs: /tmp/3AE6953B-DEE7-4441-A4C9-11E944ECB248/20180328-122732.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK] vpnkit
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] virtualization VT-X
[OK] app
[OK] moby
[OK] system
[OK] moby-syslog
[OK] kubernetes
[OK] files
[OK] env
[OK] virtualization kern.hv_support
[OK] osxfs
[OK] moby-console
[OK] logs
[ERROR] docker-cli
docker ps failed
[OK] disk
FWIW, here's another one:
Docker for Mac: version: 18.03.0-ce-mac59 (dd2831d4b7421cf559a0881cc7a5fdebeb8c2b98)
macOS: version 10.13.3 (build: 17D102)
logs: /tmp/CF6CE828-01B4-4174-ADBE-A88C9730F835/20180328-193859.tar.gz
[OK] vpnkit
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] virtualization VT-X
[OK] app
[OK] moby
[OK] system
[OK] moby-syslog
[OK] kubernetes
[OK] files
[OK] env
[OK] virtualization kern.hv_support
[OK] osxfs
[OK] moby-console
[OK] logs
[OK] docker-cli
[OK] disk
Diagnostic ID: CF6CE828-01B4-4174-ADBE-A88C9730F835
One thing I noticed is that the delay is always just a few milliseconds over 30s. My hunch is there's some timeout of 30s which blocks execution, before the actual command is performed.
@ryanfb Curious to know if your disk is encrypted.
I also tried switching to an external (Mac OS Extended (Journaled)) volume on a USB drive and although it is much slower I'm no longer having the same issue as before. The main volume on my Macbook Pro is Mac OS Extended (Journaled, Encrypted). Perhaps using an encrypted volume has something to do with this?
My disk is encrypted as well... Putting the storage onto another drive is definitely not an option for me or my company because we'd lose the interest of having our HDs encrypted 😞
I'm going to create another partition (without encryption) on my internal SSD and see if that works as well.
@eexit Do you really need encryption? I'm sure your company will be just fine. 😆 If it is related to encryption hopefully identifying this as a common factor will expedite a resolution.
@rclarkburns Unfortunately, I don't have to choose... it's a company-wide policy =/
Sorry @eexit if my comment came across as serious. It was just a silly attempt at a joke. Encryption is important!
@rclarkburns Great question! It turns out, in both cases the external drive wasn't encrypted, but the internal drive was. For some reason I had originally thought the external drive on the iMac was encrypted, so I hadn't thought about that as a potential common factor.
I have since setup another partition on my internal drive that is not encrypted but experienced the same issue. At this point I'm not convinced it's related to the encryption. I guess to fully rule it out it would be interesting to test an external drive that is encrypted. For now I'm using docker-machine with Virtualbox to move forward on the project I'm currently working on.
I'm going to try testing with an encrypted external drive on one or both machines - I'll update later with the results.
Update: I'm still able to run my problematic docker-compose
from scratch (i.e. all volumes/containers removed before trying) on both machines after moving the Docker.qcow2
file to an encrypted external drive.
Docker for Mac: version: 18.03.0-ce-mac59 (dd2831d4b7421cf559a0881cc7a5fdebeb8c2b98) macOS: version 10.12.6 (build: 16G1212) logs: /tmp/5204E20A-6678-43C5-871E-F4D5CA77CF8C/20180402-151111.tar.gz failure: docker ps failed: (Failure "docker ps: timeout after 10.00s") [OK] vpnkit [OK] vmnetd [OK] dns [OK] driver.amd64-linux [OK] virtualization VT-X [OK] app [OK] moby [OK] system [OK] moby-syslog [OK] kubernetes [OK] files [OK] env [OK] virtualization kern.hv_support [OK] osxfs [OK] moby-console [OK] logs [ERROR] docker-cli docker ps failed [OK] disk
One more DIAGNOSTIC ID: 5204E20A-6678-43C5-871E-F4D5CA77CF8C
Thanks for the update @ryanfb. Sounds like encryption is not the issue but using external volumes consistently resolves the issue regardless if encrypted or not. I should also note that when I tested with a separate internal partition that wasn't encrypted, I used several different formats (Mac OS Extended (Journaled), Mac OS Extended (Case-sensitive, Journaled), and MS-DOS (FAT)) but it didn't resolve the issue.
This seems to happen to me after deleting old images, everything just stops working. Even after a reboot, Docker for Mac fails to start up and then on quitting, the com.docker.hyperkit process hangs around consuming CPU. [Edit: after a few attempts at rebooting and stopping all docker processes in Activity Monitor, I managed to replace the app with the latest from the docker shop]
Just experienced what @stringfellow mentions after docker system prune
(Total reclaimed space: 22.64GB, unsure of the amount of images...) with Experimental features on (not sure if it matters). Docker seems to be rebuilding something, or checking something? It eventually gets past whatever it is doing and starts normally.
I'm also having hanging issues since updating to 18.03.0-ce (from 17.something). Mainly seems to happen at the end of longer builds, though sometimes just when coming back to use docker after a while. Quitting/restarting docker seems to be the only solution, though it happens quickly again. I've run a system prune which cleared up 30gb or so, but this keeps reoccurring.
I don't have an encrypted drive. MacBookPro14,2 (i7 Macbook Pro 2017, 13 inch).
Diagnostics aren't very useful; "docker ps failed" just tells me what I already know:
Docker for Mac: version: 18.03.0-ce-mac60 (dd2831d4b7421cf559a0881cc7a5fdebeb8c2b98)
macOS: version 10.13.1 (build: 17B1003)
logs: /tmp/8A3BB2B3-3CCA-403F-86B5-EEB38CCD73DE/20180414-145206.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK] vpnkit
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] virtualization VT-X
[OK] app
[OK] moby
[OK] system
[OK] moby-syslog
[OK] kubernetes
[OK] files
[OK] env
[OK] virtualization kern.hv_support
[OK] osxfs
[OK] moby-console
[OK] logs
[ERROR] docker-cli
docker ps failed
[OK] disk
@ryanfb and possibly also @ndevenish i see disk related kernel crashes/timeouts in your diagnostics, if you are on macOS 10.13.4 could you try switching to raw disks? You can do that by editing:
~/Library/Group\ Containers/group.com.docker/settings.json
If you change the settings for diskPath
from Docker.qcow2
to Docker.raw
and then restart you should be using a different disk backend. The Docker.raw
disk must be on an APFS backed disk as it uses sparse files and you must be on macOS 10.13.4 as 10.13.3 seems to have a APFS bug with sparse files resulting in occasional disk corruption.
Note, if you change the disks you will "loose" all you containers in you current docker instance so you'll have to download them again.
@rn I've actually just noticed that I'm behind OS version, on 10.13.1. I'll update and see if I can reproduce, then see if stepping to raw helps. Presumably I also need to backup my volumes for this...
I don't think the OS version matters for the qcow2 backend. The reason to test the raw
backend on APFS is that, at least fro @ryanfb kernel logs the kernel reports some ATA timeouts. These could happen when we trim (ie trying to reclaim space) and the QCOW2 code for that is obviously radically different to the raw
backend on APFS with sparse files.
So trying a different backend may help us to isolate if the issue is on HyperKit, the VM or the QCOW2 storage backend.
I'm also experiencing this. My docker version is 18.03.0-ce
Mac OS version 10.13.4
. I have tried both qcow2
and raw
with the same result. I have also tried resetting to factory defaults. Here is the Diagnostic ID while using raw
backend 61FF7D88-DFA5-46D4-9F1A-200951534E77
Everything was working fine until today.
Same issue last 3 days. Version 18.03.1-ce-mac64 (24245), Mac OS 10.12.6.
@ryanfb since you saw the sata issues I pasted above, could you try a new build of hyperkit
from https://565-55985023-gh.circle-artifacts.com/0/hyperkit. you need to make this executable and copy to /Applications/Docker.app/Contents/Resources/bin/com.docker.hyperkit
(I would save the original first, though) and then restart.
This version of HyperKit has a patch tot eh AHCI controller from upstream bhyve backported, so may be related.
The encrypted drive thing may also be a clue as it might cause things to take a longer than the ATA driver in Linux is willing to tolerate (not a ATA expert...)
@rn Thanks, I will give the new hyperkit
build a try and report back. Trying with the raw
backend isn't as easy for me as I haven't upgraded any of my machines to High Sierra yet and have some qualms about migrating to APFS.
@ryanfb thanks, and understood re upgrade. We certainly had some filesystem corruption issues with sparse files on APFS prior to 10.13.4. Apple seem to have fixed all the known issues with that though (of course without any mention in the changelog).
In any case, qcow2
would also be interesting, though it has more moving parts, hence harder to diagnose.
So, not sure if my problem is anyone else's problem, but the docker daemon appears to hang if there are recursive (?) folder symlinks in the build context. (In my case, from bootstrapped lerna packages).
Making sure to exclude those paths in my .dockerignore
file helped.
Docker version 18.03.1-ce, build 9ee9f40
/ OSX 10.12.6
Expected behavior
Docker for Mac doesn't hang.
Actual behavior
Docker for Mac hangs.
Information
Diagnostic ID: BED37CCE-B2F9-41B9-B9E6-72EFEBE30091
Steps to reproduce the behavior
docker-compose up --force-recreate
docker ps
or any other command that interacts with Docker for Mac.I've tried using
docker system prune
, using "Reset" in the Docker for Mac GUI, increasing RAM/CPU allocations (currently 16GB/6 CPU), and upgrading Docker for Mac to edge. I still encounter this problem multiple times per day, and now regularly/reliably when trying to start this Compose file.