docker / machine

Machine management for a container-centric world
https://docs.docker.com/machine/
Apache License 2.0
6.62k stars 1.97k forks source link

VMware Fusion can't find IP address on Big Sur #4846

Open lloeki opened 3 years ago

lloeki commented 3 years ago

I'm not sure why, but instead of using vmrun, the vmwarefusion driver attempts to find the guest IP through roundabout ways by looking into DHCP lease files.

With the advent of Big Sur, this doesn't work anymore as vmnet1/vmnet8 have disappeared in favour of a native bridge100 interface.

Comparatively, vmrun getGuestIPAddress just works:

~> vmrun list
Total running VMs: 1
/Users/lloeki/.docker/machine/machines/fusion/fusion.vmx
~> vmrun checkToolsState /Users/lloeki/.docker/machine/machines/fusion/fusion.vmx
running
~> vmrun getGuestIPAddress /Users/lloeki/.docker/machine/machines/fusion/fusion.vmx
172.16.159.2

I suppose it might just be a case of writing a getIPfromVmrun() or something.

lloeki commented 3 years ago

Well, this did the trick:

diff --git a/drivers/vmwarefusion/fusion_darwin.go b/drivers/vmwarefusion/fusion_darwin.go
index fb22c96a..b1dcff5e 100644
--- a/drivers/vmwarefusion/fusion_darwin.go
+++ b/drivers/vmwarefusion/fusion_darwin.go
@@ -194,6 +194,11 @@ func (d *Driver) GetIP() (string, error) {
                return "", err
        }

+       // attempt to find the address from vmrun
+       if ip, err := d.getIPfromVmrun(); err == nil {
+               return ip, err
+       }
+
        // attempt to find the address in the vmnet configuration
        if ip, err := d.getIPfromVmnetConfiguration(macaddr); err == nil {
                return ip, err
@@ -535,6 +540,18 @@ func (d *Driver) getIPfromVmnetConfiguration(macaddr string) (string, error) {
        return "", fmt.Errorf("IP not found for MAC %s in vmnet configuration files", macaddr)
 }

+func (d *Driver) getIPfromVmrun() (string, error) {
+       vmx := d.vmxPath()
+
+       ip := regexp.MustCompile(`(\d+\.\d+\.\d+\.\d+)`)
+       stdout, _, _ := vmrun("getGuestIPAddress", vmx)
+       if matches := ip.FindStringSubmatch(stdout); matches != nil {
+               return matches[1], nil
+       }
+
+       return "", fmt.Errorf("could not get IP from vmrun")
+}
+
 func (d *Driver) getIPfromVmnetConfigurationFile(conffile, macaddr string) (string, error) {
        var conffh *os.File
        var confcontent []byte
jgangemi commented 3 years ago

is there a work for this right now? do you have a compiled binary you could share?

lloeki commented 3 years ago

Compiled with go 1.13 from my branch at #4847.

docker-machine.tar.gz

jgangemi commented 3 years ago

awesome, very much appreciated!!!

jgangemi commented 3 years ago

your binary doesn't work for me, it just sits looping over the legacy files.

jgangemi commented 3 years ago

i found another way around this in my setup.

/var/db/vmware/vmnet-dhcpd-vmnet8.leases contains the lease/ip information being looked for, so i just updated that with the ip and mac address for the vm obtained with a call to arp -a.

lloeki commented 3 years ago

I just created a new one with docker-machine create -d vmwarefusion foo.

It seemed to fail to work right away, blocking when waiting for the VM to come up, and ultimately bailing out after 120s of not getting an IP (which docker-machine ls failed to show, although the VM was shown as running), because vmwaretools were not started (there was an error on the VM console, this kind of thing happened even before Big Sur and required a stop/start/regenerate-certs cycle).

But a stop+start made it show the proper IP:

$ docker-machine ls
NAME     ACTIVE   DRIVER         STATE     URL                       SWARM   DOCKER      ERRORS
foo      -        vmwarefusion   Running   tcp://172.16.159.3:2376           Unknown     Unable to query docker version: Get https://172.16.159.3:2376/v1.15/version: x509: certificate signed by unknown authority
fusion   *        vmwarefusion   Running   tcp://172.16.159.2:2376           v19.03.12   

It still failed though:

$ docker-machine regenerate-certs foo
Regenerate TLS machine certs?  Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded
jgangemi commented 3 years ago

it's interesting you can get a machine to create using default driver, i have to use docker-machine-driver-vmware to get mine to create.

in any case, if you edit update the lease file by hand, doing a rm and then a create should work w/o issue b/c the vm should be re-created using the same ip address (assuming the same name)

lloeki commented 3 years ago

Just created a machine successfully from A to Z:

Open a first terminal window:

$ bin/docker-machine -D create -d vmwarefusion foo
# wait for ip find loop, do not interrupt it

Then, open a second terminal window:

$ bin/docker-machine ls
NAME     ACTIVE   DRIVER         STATE     URL   SWARM   DOCKER    ERRORS
foo      -        vmwarefusion   Running                 Unknown   IP not found for MAC 00:0c:29:8c:d5:20 in DHCP leases

It looks like VMware tools don't come up properly at this stage, but it does run vmrun getGuestIPAddress each time. In that

$ bin/docker-machine stop foo
Stopping "foo"...
Machine "foo" was stopped.
$ bin/docker-machine start foo
Starting "foo"...
Machine "foo" was started.
Waiting for SSH to be available...
Detecting the provisioner...
Started machines may have new IP addresses. You may need to re-run the `docker-machine env` command.
$ docker-machine ls        
NAME     ACTIVE   DRIVER         STATE     URL                       SWARM   DOCKER    ERRORS
foo      -        vmwarefusion   Running   tcp://172.16.159.3:2376           Unknown   Unable to query docker version: Get https://172.16.159.3:2376/v1.15/version: x509: certificate signed by unknown authority

Notice how the IP is correctly detected: rebooting the VM allowed the tools to start correctly this time, but there is a problem with the certificate.

Back in the first terminal, notice that the first command proceeded once it found the IP, only to fail later on, apparently because it could not enableSharedFolders?:

...
(foo) DBG | MAC address in VMX: 00:0c:29:8c:d5:20
(foo) DBG | Trying to find IP address via VMware tools: /Users/lloeki/.docker/machine/machines/foo/foo.vmx
(foo) DBG | executing: /Applications/VMware Fusion.app/Contents/Public/vmrun getGuestIPAddress /Users/lloeki/.docker/machine/machines/foo/foo.vmx
(foo) DBG | Found IP address via VMware tools: 172.16.159.3
(foo) DBG | Got an ip: 172.16.159.3
(foo) DBG | Creating Tar key bundle...
(foo) DBG | executing: /Applications/VMware Fusion.app/Contents/Public/vmrun -gu docker -gp tcuser directoryExistsInGuest /Users/lloeki/.docker/machine/machines/foo/foo.vmx /var/lib/boot2docker
(foo) DBG | executing: /Applications/VMware Fusion.app/Contents/Public/vmrun -gu docker -gp tcuser CopyFileFromHostToGuest /Users/lloeki/.docker/machine/machines/foo/foo.vmx /Users/lloeki/.docker/machine/machines/foo/userdata.tar /home/docker/userdata.tar
(foo) DBG | executing: /Applications/VMware Fusion.app/Contents/Public/vmrun -gu docker -gp tcuser runScriptInGuest /Users/lloeki/.docker/machine/machines/foo/foo.vmx /bin/sh sudo sh -c "tar xvf /home/docker/userdata.tar -C /home/docker > /var/log/userdata.log 2>&1 && chown -R docker:staff /home/docker"
(foo) DBG | executing: /Applications/VMware Fusion.app/Contents/Public/vmrun -gu docker -gp tcuser runScriptInGuest /Users/lloeki/.docker/machine/machines/foo/foo.vmx /bin/sh sudo /bin/mv /home/docker/userdata.tar /var/lib/boot2docker/userdata.tar
(foo) DBG | executing: /Applications/VMware Fusion.app/Contents/Public/vmrun -gu docker -gp tcuser enableSharedFolders /Users/lloeki/.docker/machine/machines/foo/foo.vmx
Error creating machine: Error in driver during machine creation: exit status 255
notifying bugsnag: [Error creating machine: Error in driver during machine creation: exit status 255]

But now that we have an IP address we can proceed from there again and complete the crucial missing step:

$ bin/docker-machine -D regenerate-certs foo      
Regenerate TLS machine certs?  Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
$ bin/docker-machine ls
NAME     ACTIVE   DRIVER         STATE     URL                       SWARM   DOCKER      ERRORS
foo      -        vmwarefusion   Running   tcp://172.16.159.3:2376           v19.03.12   
$ bin/docker-machine env foo                      
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://172.16.159.3:2376"
export DOCKER_CERT_PATH="/Users/lloeki/.docker/machine/machines/foo"
export DOCKER_MACHINE_NAME="foo"
# Run this command to configure your shell: 
# eval $(bin/docker-machine env foo)

To have docker -v volumes work though, one ought to add a shared folder to /Users, which springs some error:

$ vmrun addSharedFolder /Users/lloeki/.docker/machine/machines/foo/foo.vmx /Users /Users
Error: There was an error mounting the Shared Folders file system inside the guest operating system

And while I do get errors such as the above and the ones below, it does appear to work:

$ docker-machine ssh foo
chmod: /mnt/hgfs/Users: Operation not permitted
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
chmod: /mnt/hgfs/Users: Operation not permitted
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
docker@foo:~$ ls /Users/
Shared  lloeki

Note that barring from the IP issue which is a Big Sur thing, I've had all of those errors on Catalina as well.

jgangemi commented 3 years ago

interesting. i can't get a vm to come up using your steps and the fusion driver. ssh just continues to loop w/ an exit code of 255 and i am unable to regenerate the certs. i also noticed that if i move the old dhcp files out of /var/db/vmware, they don't seem to be recreated.

also just as an unrelated fyi, vpn routing is broken in big sur/fusion 12 as well, just in case you use it and haven't encountered it yet. i can provide a link to a work around if you need it.

hopefully some fix will arise. i am not a fan of docker for mac b/c it doesn't behave the same as docker on a linux box (kafka is a good example of this) and so docker-machine is a great way to spin things up.