canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.85k stars 650 forks source link

Instances fail to start after making network changes on macOS host #2222

Open TwoLeggedCat opened 3 years ago

TwoLeggedCat commented 3 years ago

My internet connection currently comes from an iPhone hotspot over a lightning cable. After removing all instances, I can create a new one with

~ % m launch focal Launched: pet-gadwall

I then stopped the instance with sudo shutdown now, unplugged the iPhone, and tried to launch it again with m shell pet-gadwall:

start failed: The following errors occurred:
Instance stopped while starting

The last few lines of /Library/Logs/Multipass/multipassd.log:

[2021-09-03T14:24:43.655] [info] [pet-gadwall] process state changed to NotRunning
[2021-09-03T14:24:43.655] [info] [pet-gadwall] process finished with exit code 0
[2021-09-03T14:25:17.545] [debug] [pet-gadwall] Waiting for SSH to be up
[2021-09-03T14:25:17.545] [debug] [pet-gadwall] process working dir ''
[2021-09-03T14:25:17.545] [info] [pet-gadwall] process program '/Library/Application Support/com.canonical.multipass/bin/hyperkit'
[2021-09-03T14:25:17.545] [info] [pet-gadwall] process arguments '-c, 1, -m, 1024M, -u, -A, -H, -U, a7a10293-0dcb-3001-a84d-a041f306795a, -s, 0:0,hostbridge, -s, 2:0,virtio-net, -s, 5,virtio-rnd, -s, 31,lpc, -l, com1,autopty=/var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/pty,log=/Library/Logs/Multipass/pet-gadwall-hyperkit.log, -s, 1:0,virtio-blk,file:///var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1,format=qcow,qcow-config=discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false, -s, 1:1,ahci-cd,/var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/cloud-init-config.iso, -f, kexec,/var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/ubuntu-20.04-server-cloudimg-amd64-vmlinuz-generic,/var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/ubuntu-20.04-server-cloudimg-amd64-initrd-generic,earlyprintk=serial console=ttyS0 root=/dev/vda1 rw panic=1 no_timer_check'
[2021-09-03T14:25:17.545] [info] [pet-gadwall] process state changed to Starting
[2021-09-03T14:25:17.550] [info] [pet-gadwall] process state changed to Running
[2021-09-03T14:25:17.559] [error] [pet-gadwall] Using fd 5 for I/O notifications
[2021-09-03T14:25:17.561] [info] [pet-gadwall] mirage_block_open: block_config = file:///var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1 and qcow_config = discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false and stats_config = None

[2021-09-03T14:25:17.590] [info] [pet-gadwall] hyperkit: [INFO] Resized file to 23577 clusters (3017856 sectors)
[2021-09-03T14:25:17.590] [info] [pet-gadwall] hyperkit: [INFO] block recycler starting with keep_erased = 2048
[2021-09-03T14:25:17.590] [info] [pet-gadwall] hyperkit: [INFO] image has 0 free sectors and 23573 used sectors
[2021-09-03T14:25:17.590] [info] [pet-gadwall] mirage_block_open: block_config = file:///var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1 and qcow_config = discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false and stats_config = None returning 0
mirage_block_stat

[2021-09-03T14:25:17.591] [info] [pet-gadwall] /var/root/Library/Application Support/multipassd/vault/instances/pet-gadwall/cloud-init-config.iso: fcntl(F_PUNCHHOLE) Operation not permitted: block device will not support TRIM/DISCARD
[2021-09-03T14:25:17.716] [info] [pet-gadwall] virtio_net: Could not create vmnet interface, permission denied or no entitlement?

[2021-09-03T14:25:17.719] [info] [pet-gadwall] process state changed to NotRunning
[2021-09-03T14:25:17.719] [info] [pet-gadwall] process finished with exit code 1

Stopping the daemon with sudo kill pid and attempting to launch the instance the same way succeeds as normal.

The daemon is running as root.

macOS Catalina version 10.15.7 (19H1323)

~ % multipass --version multipass 1.7.0+mac multipassd 1.7.0+mac

I installed Multipass using the pkg file from GitHub releases, not Homebrew (nor do I have Homebrew on this machine).

Also worth noting is that, when I use m list after receiving the startup error, the instance's state is shown as "Starting." Stopping the instance appears to run as normal but has no effect; I can still not start it again after doing so and the state is still shown as "Starting." I can also start a new instance after making the network change, I just cannot start one that was running with internet after disconnecting from the internet (until I restart the daemon).

Saviq commented 3 years ago

Hi @TwoLeggedCat, make sure you have Internet Sharing disabled in Settings/Sharing.

TwoLeggedCat commented 3 years ago

Hi @TwoLeggedCat, make sure you have Internet Sharing disabled in Settings/Sharing.

Thanks for the hasty reply; I do.

Saviq commented 3 years ago

@TwoLeggedCat you mean you do have it disabled? This error pops up when macOS prevents the VM from creating its network. You may try enabling Internet Sharing and disabling it again, that could clear some internal macOS state. Rebooting could also help. Let us know please!

TwoLeggedCat commented 3 years ago

@Saviq yes, I have Internet Sharing disabled, and haven’t had it enabled in about a year to my memory. I’ll try rebooting and then disconnecting again when I get back to my computer.

TwoLeggedCat commented 3 years ago

@Saviq I went through the following steps:

  1. Reboot computer with iPhone plugged in
  2. multipass launch focal (name solid-roadrunner)
  3. multipass shell solid-roadrunner and sudo shutdown now
  4. Unplug iPhone and multipass shell solid-roadrunner:
    start failed: The following errors occurred:                                    
    Instance stopped while starting

At this point, multipass list still shows that solid-roadrunner is "Starting."

multipassd.log ``` [2021-09-06T10:17:34.207] [debug] [solid-roadrunner] Waiting for SSH to be up [2021-09-06T10:17:34.207] [debug] [solid-roadrunner] process working dir '' [2021-09-06T10:17:34.207] [info] [solid-roadrunner] process program '/Library/Application Support/com.canonical.multipass/bin/hyperkit' [2021-09-06T10:17:34.207] [info] [solid-roadrunner] process arguments '-c, 1, -m, 1024M, -u, -A, -H, -U, b9c446c0-5add-37fd-a358-0c7cdb66dbe2, -s, 0:0,hostbridge, -s, 2:0,virtio-net, -s, 5,virtio-rnd, -s, 31,lpc, -l, com1,autopty=/var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/pty,log=/Library/Logs/Multipass/solid-roadrunner-hyperkit.log, -s, 1:0,virtio-blk,file:///var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1,format=qcow,qcow-config=discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false, -s, 1:1,ahci-cd,/var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/cloud-init-config.iso, -f, kexec,/var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/ubuntu-20.04-server-cloudimg-amd64-vmlinuz-generic,/var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/ubuntu-20.04-server-cloudimg-amd64-initrd-generic,earlyprintk=serial console=ttyS0 root=/dev/vda1 rw panic=1 no_timer_check' [2021-09-06T10:17:34.208] [info] [solid-roadrunner] process state changed to Starting [2021-09-06T10:17:34.210] [info] [solid-roadrunner] process state changed to Running [2021-09-06T10:17:34.238] [error] [solid-roadrunner] Using fd 5 for I/O notifications [2021-09-06T10:17:34.240] [info] [solid-roadrunner] mirage_block_open: block_config = file:///var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1 and qcow_config = discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false and stats_config = None [2021-09-06T10:17:34.274] [info] [solid-roadrunner] hyperkit: [INFO] Resized file to 23536 clusters (3012608 sectors) [2021-09-06T10:17:34.274] [info] [solid-roadrunner] hyperkit: [INFO] block recycler starting with keep_erased = 2048 [2021-09-06T10:17:34.275] [info] [solid-roadrunner] hyperkit: [INFO] image has 0 free sectors and 23532 used sectors [2021-09-06T10:17:34.275] [info] [solid-roadrunner] mirage_block_open: block_config = file:///var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1 and qcow_config = discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false and stats_config = None returning 0 mirage_block_stat [2021-09-06T10:17:34.275] [info] [solid-roadrunner] /var/root/Library/Application Support/multipassd/vault/instances/solid-roadrunner/cloud-init-config.iso: fcntl(F_PUNCHHOLE) Operation not permitted: block device will not support TRIM/DISCARD [2021-09-06T10:17:34.502] [info] [solid-roadrunner] virtio_net: Could not create vmnet interface, permission denied or no entitlement? [2021-09-06T10:17:34.504] [info] [solid-roadrunner] process state changed to NotRunning [2021-09-06T10:17:34.504] [info] [solid-roadrunner] process finished with exit code 1 ```
Internet Sharing preferences image

Edit: woops, accidentally closed this.

Saviq commented 3 years ago

Thanks @TwoLeggedCat, so it's this that makes it go bad:

[2021-09-06T10:17:34.502] [info] [solid-roadrunner] virtio_net: Could not create vmnet interface, permission denied or no entitlement?

Hyperkit tries to set up networking, but macOS is in a state preventing that. If you then reboot without the iPhone plugged in, does it work ok? What about if you enable and disable Internet Sharing?

TwoLeggedCat commented 3 years ago

@Saviq enabling and disabling Internet Sharing changed nothing; rebooting with the iPhone unplugged allowed me to launch it. After plugging the iPhone back in, I could still launch the instance. It seems it only starts to break when I unplug the iPhone, not when I plug it in.

Saviq commented 3 years ago

Thanks @TwoLeggedCat, unfortunately it's an unsolved issue in xhyve/hyperkit, there's a number of unanswered issues out there (machine-drivers/docker-machine-driver-xhyve#85, machine-drivers/docker-machine-driver-xhyve#107).

That said, we're moving to QEMU as our preferred VM backend on macOS, it would be interesting if the issue is seen there as well. If you're willing, please try this installer:

https://multipass-ci.s3.amazonaws.com/pr403/multipass-1.8.0-dev.404.pr403%2Bg4f730915.mac%2Bx86_64-Darwin.pkg

And use sudo multipass set local.driver=qemu to switch to that backend. Note you will lose access (but not data) to existing VMs. To switch back, sudo multipass set local.driver=hyperkit.

TwoLeggedCat commented 3 years ago

@Saviq, with local.driver set to qemu, I get the following (after creating the tireless-rodent on iPhone USB since I needed to download the image and then disconnecting the phone):

~ % m shell tireless-rodent
start failed: The following errors occurred:                                    
Failed to start interface: generic failure
Hint: vmnet requires running with root access
qemu-system-x86_64: -netdev vmnet-macos,mode=shared,id=hostnet0: Device 'vmnet-macos' could not be initialized
tireless-rodent: shutdown called while starting

Using that build with qemu did seem to fix all the problems with multipass list but I'm assuming that's unrelated.

Also, to confirm:

~ % multipass --version
multipass   1.8.0-dev.404.pr403+g4f730915.mac
multipassd  1.8.0-dev.404.pr403+g4f730915.mac
Saviq commented 3 years ago

Yeah so that's the same problem - macOS doesn't allow to use the virtual network subsystem for some reason. Thanks for diving in, we'll need to look for potential solutions to this.

rcousens commented 2 years ago

Just a note, I managed to recreate this problem with multipass 1.8.0 by ctrl-c'ing (aggressively) during a launch when I didn't want to continue launching that machine (wrong default image/alias). This was on an M1 Macbook Pro running Monterey 12.3.1. Thought this might give you a starting point for a reproducible case for diagnosing and debugging the issue!

ubuntu@sanctified-cardinal:~$ exit
logout
 ~  multipass delete sanctified-cardinal                                                                                                                                                                                                 
 ~  multipass launch                                                                                                                                                                                                                              
Starting inspiring-pratincole |^C⏎
 ~  multipass list                                                                                                                                                                                                                            
Name                    State             IPv4             Image
primary                 Deleted           --               Not Available
inspiring-pratincole    Unknown           --               Ubuntu 20.04 LTS
sanctified-cardinal     Deleted           --               Not Available
vocal-bellbird          Deleted           --               Not Available
 ~  multipass delete inspiring-pratincole                                                                                                                                                                                                         

 ~                                                                                                                                                                                                                                                
 ~  multipass list                                                                                                                                                                                                                          
Name                    State             IPv4             Image
primary                 Deleted           --               Not Available
inspiring-pratincole    Deleted           --               Not Available
sanctified-cardinal     Deleted           --               Not Available
vocal-bellbird          Deleted           --               Not Available
 ~  multipass purge
 ~  multipass launch                                                                                                                                                                                                                              
launch failed: The following errors occurred:
Failed to start interface: generic failure
Hint: vmnet requires running with root access
qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=52:54:00:79:8d:24: Device 'vmnet-macos' could not be initialized