Parallels / docker-machine-parallels

Parallels driver for Docker Machine https://github.com/docker/machine
MIT License
471 stars 35 forks source link

Docker Machine is not running properly after update to macOS Catalina #83

Closed mediaessenz closed 4 years ago

mediaessenz commented 5 years ago

After the first generation of a new machine (in my case done by "dinghy", a reverse proxy solution) everything seems to work normal. But after stopping the machine and try to start it again, its not working anymore. The startup ends up with this error message

Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded
Traceback (most recent call last):
    9: from /usr/local/bin/_dinghy_command:12:in `<main>'
    8: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/base.rb:440:in `start'
    7: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor.rb:359:in `dispatch'
    6: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/invocation.rb:126:in `invoke_command'
    5: from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/command.rb:27:in `run'
    4: from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:93:in `up'
    3: from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:271:in `start_services'
    2: from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:25:in `up'
    1: from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:126:in `system'
/usr/local/Cellar/dinghy/4.6.5/cli/dinghy/system.rb:18:in `system': Failure calling `docker-machine start dinghy` (System::Failure)

After starting the machine with debug option I get ten times this fault, before the upper error comes up again:

(dinghy) Calling .GetSSHHostname
(dinghy) DBG | executing: /usr/local/bin/prlctl list dinghy --output status --no-header
(dinghy) DBG | executing: /usr/local/bin/prlctl list -i dinghy
(dinghy) DBG | Found lease: 10.211.55.32 for MAC: 001C4208D8F8, expiring at 1571651690, leased for 1800 s.
(dinghy) DBG |
(dinghy) DBG | Found IP lease: 10.211.55.32 for MAC address 001C4208D8F8
(dinghy) DBG |
(dinghy) Calling .GetSSHPort
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHKeyPath
(dinghy) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.32 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/local/bin/ssh <nil>}
About to run SSH command:
if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi
SSH cmd err, output: <nil>: Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 :::22                   :::*                    LISTEN

Watching the bootup sequence by opening the miniature window inside the parallel controll panel I see several errors and warning during the first bootup:

...
unable to write 'random state'
...
unable to write 'random state'
...
Device "eth1" does not exists.
...
unable to write 'random state'
...
unable to write 'random state'

Independent of this messages the machine works as expected until I stop and start it again. If I doing this, I see only one warning during the bootup inside the parallels window:

warning: unable to find partition with the swap label (boot2dockerswap) or TYPE=swap (so Docker will likely complain about swap)
- this could also mean TCL already mounted it! (see 'free' or '/proc/swaps')

I have two macs (both already updated to macOS Catalina) with exact the same problem (after updating the system).

There is also an issue I posted at the dinghy repo, but the autor means the problem stuck inside docker-machine or this parallels connector: https://github.com/codekitchen/dinghy/issues/290

KatSick commented 5 years ago

any updates on this? I have similar problem. before latest Catalina updates all went well, but now on come commands like docker-compose up I see: [1] 26388 abort docker-compose up and that's all

mediaessenz commented 5 years ago

@KatSick Maybe this helps: https://github.com/Homebrew/homebrew-core/issues/45687#issuecomment-547102000

romankulikov commented 5 years ago

@KatSick Maybe this helps: Homebrew/homebrew-core#45687 (comment)

@mediaessenz , does this workaround help for you?

mediaessenz commented 5 years ago

@KatSick Maybe this helps: Homebrew/homebrew-core#45687 (comment)

@mediaessenz , does this workaround help for you?

Yes

mediaessenz commented 5 years ago

An Update: I got a brand new iMac yesterday and started to set up the system to my needs. Because of the problems with restarting a docker machine I described here, I made the desicion not to use a timemachine backup of my old mac. I installed only some basic stuff (browser, iterm, parallels desktop 15 and brew) before I used brew to install docker, docker-compose, docker-machine and docker-machine-parallels. After this I created a new docker-machine and still have the same problem like before. The machine comes up the first time without problems and end in the same error described up after stop and trying to start it again. Also the messages shown in the parallels window I described up are the same.

mediaessenz commented 5 years ago

Am I really the ony one on this planet who have problems with using docker machine together with parallels desktop on macOS Catalina?

mediaessenz commented 5 years ago

After comparing the debug output of the first (working) start (create command) and the second (not working) start (start command) I found this different which may can help to identify the problem:

  1. Start:
    
    ...
    Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------)
    &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.8 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh <nil>}
    About to run SSH command:
    sudo /usr/bin/sethostname dinghy && echo "dinghy" | sudo tee /var/lib/boot2docker/etc/hostname
    SSH cmd err, output: <nil>: Setting hostname to dinghy Done.
    dinghy

(dinghy) Calling .GetSSHHostname (dinghy) DBG | executing: /usr/local/bin/prlctl list dinghy --output status --no-header (dinghy) DBG | executing: /usr/local/bin/prlctl list -i dinghy (dinghy) DBG | Found lease: 10.211.55.8 for MAC: 001C42CE0C5E, expiring at 1574071941, leased for 1800 s. (dinghy) DBG | (dinghy) DBG | Found IP lease: 10.211.55.8 for MAC address 001C42CE0C5E (dinghy) DBG | (dinghy) Calling .GetSSHPort (dinghy) Calling .GetSSHKeyPath (dinghy) Calling .GetSSHKeyPath (dinghy) Calling .GetSSHUsername Using SSH client type: external Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.8 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh } About to run SSH command: if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi SSH cmd err, output: : Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:22 0.0.0.0: LISTEN tcp 0 0 :::2376 ::: LISTEN tcp 0 0 :::22 :::* LISTEN ...

2. Start

... Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.6 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh } About to run SSH command: if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi SSH cmd err, output: : Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:22 0.0.0.0: LISTEN tcp 0 0 :::22 ::: LISTEN

(dinghy) Calling .GetSSHHostname (dinghy) DBG | executing: /usr/local/bin/prlctl list dinghy --output status --no-header (dinghy) DBG | executing: /usr/local/bin/prlctl list -i dinghy (dinghy) DBG | Found lease: 10.211.55.6 for MAC: 001C4295E032, expiring at 1574070104, leased for 1800 s. (dinghy) DBG | (dinghy) DBG | Found IP lease: 10.211.55.6 for MAC address 001C4295E032 (dinghy) DBG | (dinghy) Calling .GetSSHPort (dinghy) Calling .GetSSHKeyPath (dinghy) Calling .GetSSHKeyPath (dinghy) Calling .GetSSHUsername Using SSH client type: external Using SSH private key: /Users/alex/.docker/machine/machines/dinghy/id_rsa (-rw-------) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null docker@10.211.55.6 -o IdentitiesOnly=yes -i /Users/alex/.docker/machine/machines/dinghy/id_rsa -p 22] /usr/bin/ssh } About to run SSH command: if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi SSH cmd err, output: : Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:22 0.0.0.0: LISTEN tcp 0 0 :::22 ::: LISTEN ...

romankulikov commented 5 years ago

Well, it looks like the issue is in broken IP address reporting from virtual machine. I'm investigating it.

mediaessenz commented 5 years ago

Any news about this?

romankulikov commented 4 years ago

It looks like this is duplicate of #https://github.com/docker/machine/issues/3595. For example, switching to older version of boot2docker works for me:

$ dinghy create --provider parallels --boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v18.06.1-ce/boot2docker.iso
romankulikov commented 4 years ago

It looks like this is duplicate of #docker/machine#3595. For example, switching to older version of boot2docker works for me:

$ dinghy create --provider parallels --boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v18.06.1-ce/boot2docker.iso

Well, now after more investigation I tend to reference this issue down to the lack of entropy at the start of virtual machine. It is described in #https://github.com/boot2docker/boot2docker/pull/1322#issuecomment-396372707. Watching at /var/lib/boot2docker/log/docker.log inside virtual machine this message is printed at the start of dockerd:

crypto/rand: blocked for 60 seconds waiting to read random data from the kernel

–dockerd hangs at the start of the system which results in docker-machine create command failure (and consequent certificate issues).

Issue is reproduced on the recent boot2docker versions, like current 19.03.5, based on Linux kernel 4.14. Version 18.06.1 works because it is based on Linux kernel 4.9 where entropy pool state during VM boot is better.

mediaessenz commented 4 years ago

@romankulikov You are my HERO!!! After switching to the older image my problems are gone! Thank You very much for the energy you put into this issue!

iby commented 4 years ago

@romankulikov Sorry in advance if the question is dumb. I understand this is an issue with boot2docker image, but isn't this a show stopper? Aren't other drivers affected by this and if no, is there a way Parallels driver can adapt? If I understand correctly the referencing PR had something to do with fixing this, but the latest version (on Mojave) still shows the same behaviour (can create, cannot restart).

romankulikov commented 4 years ago

I've tried Parallels Desktop 15.1.2 and VirtualBox 6.1.4 with Boot2Docker v19.03.5–both work ok for me at the moment. At least in this case of creating and starting "dinghy" machine. @ianbytchek, can you please share your setup for me to reproduce the problem?

Speaking about the lack of entropy when starting the guest OS on the one hand it does look like a showstopper. On the other hand it doesn't look a like an easy thing to fix. On the third hand issue is currently addressed in modern Linux kernels: https://lwn.net/Articles/808575/ https://git.kernel.org/linus/50ee7529ec4500c88f8664560770a7a1b65db72b

Not sure where to move forward.

romankulikov commented 4 years ago

Well, it looks like boot2docker starting from 19.03.5 has a backported patch from Linux kernel 5.4 with entropy fixes. So my picture of the problem is broken. And I need to know if one can reproduce the issue with recent boot2docker image.

mediaessenz commented 4 years ago

Unfortunately, at least for me, the problem still exists with the latest boot2docker image (19.03.5)

josefglatz commented 4 years ago

@romankulikov Tried it with boot2docker iso image version 19.03.5 and get still same error es @mediaessenz already mentioned after a dinghy stop && dinghy up:

Starting the dinghy VM...

Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded
/usr/local/Cellar/dinghy/4.6.5/cli/dinghy/system.rb:18:in `system': Failure calling `docker-machine start dinghy` (System::Failure)
    from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:126:in `system'
    from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:25:in `up'
    from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:271:in `start_services'
    from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:93:in `up'
    from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/command.rb:27:in `run'
    from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/invocation.rb:126:in `invoke_command'
    from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor.rb:359:in `dispatch'
    from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/base.rb:440:in `start'
    from /usr/local/bin/_dinghy_command:12:in `<main>'
legal90 commented 4 years ago

Well, it looks like boot2docker starting from 19.03.5 has a backported patch from Linux kernel 5.4 with entropy fixes.

boot2docker 19.03.5 was released from the earlier state, before that fix. That's why the issue still persists.

And, unfortunately, it seems there will be no releases anymore 😭 : https://github.com/boot2docker/boot2docker/pull/1408 Together with https://github.com/docker/machine/issues/4537, it looks like the sunset of the entire Docker Machine project.

romankulikov commented 4 years ago

boot2docker 19.03.5 was released from the earlier state, before that fix. That's why the issue still persists.

Yeah :-(

@legal90, how should we proceed with this issue? From my point it should be fixed only on guest OS (i.e. boot2docker) side. Is forking boot2docker an option?

legal90 commented 4 years ago

@romankulikov Building and releasing a custom boot2docker.iso might be an option, but in this case all users will have to specify the custom URL to it using --parallels-boot2docker-url flag. Let's see if there will be any fork continued by the community.

I asked here if there is any chance for the patch to be released: https://github.com/boot2docker/boot2docker/pull/1403#issuecomment-648843520

legal90 commented 4 years ago

v19.03.12, the final release of boot2docker was published today: https://github.com/boot2docker/boot2docker/releases/tag/v19.03.12

It includes the fix boot2docker/boot2docker#1403 and this issue should be solved there. I checked it on the test vm by doing docker-machine restart several times and it works as expected - no "Maximum number of retries (10) exceeded" errors anymore. @mediaessenz, please, verify it in your setup

mediaessenz commented 4 years ago

YES, IT WORKS !

Thanks a lot to all involved people!