canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.79k stars 643 forks source link

multipass times out to start a VM #3003

Open gmarik opened 1 year ago

gmarik commented 1 year ago

Describe the bug

$ multipass start -vvv dev
[2023-03-30T21:14:56.327] [debug] [dev] process working dir ''                  
[2023-03-30T21:14:56.327] [info] [dev] process program 'qemu-system-aarch64'    
[2023-03-30T21:14:56.327] [info] [dev] process arguments '-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, cortex-a72, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso, -virtfs, local,security_model=passthrough,uid_map=501:1000,gid_map=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59'
[2023-03-30T21:14:56.336] [debug] [qemu-system-aarch64] [2791] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.MQYyKQ
[2023-03-30T21:14:56.396] [info] [dev] process state changed to Starting        
[2023-03-30T21:14:56.398] [info] [dev] process state changed to Running         
[2023-03-30T21:14:56.399] [debug] [qemu-system-aarch64] [2792] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso -virtfs local,security_model=passthrough,uid_map=501:1000,gid_map=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59
[2023-03-30T21:14:56.399] [info] [dev] process started                          
start failed: The following errors occurred:                                    
dev: timed out waiting for response

To Reproduce See the log above

Expected behavior VM starts

Logs

``` [2023-03-30T21:12:55.106] [debug] [update] Latest Multipass release available is version 1.11.1 [2023-03-30T21:12:55.138] [info] [VMImageHost] Did not find any supported products in "appliance" [2023-03-30T21:12:55.143] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket [2023-03-30T21:12:55.145] [debug] [qemu-img] [2755] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/ml/ubuntu-22.0 4-server-cloudimg-arm64.img [2023-03-30T21:12:55.151] [debug] [qemu-img] [2756] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/test/ubuntu-22 .04-server-cloudimg-arm64.img [2023-03-30T21:12:55.157] [debug] [qemu-img] [2757] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22. 04-server-cloudimg-arm64.img [2023-03-30T21:12:55.163] [debug] [qemu-img] [2758] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/bitto/ubuntu-2 2.04-server-cloudimg-arm64.img [2023-03-30T21:12:55.167] [info] [daemon] Starting Multipass 1.11.1+mac [2023-03-30T21:12:55.167] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug [2023-03-30T21:14:56.326] [debug] [dev] process working dir '' [2023-03-30T21:14:56.327] [info] [dev] process program 'qemu-system-aarch64' [2023-03-30T21:14:56.327] [info] [dev] process arguments '-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/b in/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, cortex-a72, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format =qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cd rom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso, -virtfs, local,security_model=passthrough,uid_map=501:1000,gid_m ap=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59' [2023-03-30T21:14:56.336] [debug] [qemu-system-aarch64] [2791] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz /zyxvpxvq6csfxvn_n0000000000000/T/multipassd.MQYyKQ [2023-03-30T21:14:56.396] [info] [dev] process state changed to Starting [2023-03-30T21:14:56.398] [info] [dev] process state changed to Running [2023-03-30T21:14:56.398] [debug] [qemu-system-aarch64] [2792] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso -virtfs local,security_model=passthrough,uid_map=501:1000,gid_map=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59 [2023-03-30T21:14:56.399] [info] [dev] process started [2023-03-30T21:14:56.399] [debug] [dev] Waiting for SSH to be up [2023-03-30T21:14:56.739] [debug] [dev] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 7}, "package": ""}, "capabilities": ["oob"]}} [2023-03-30T21:14:56.774] [debug] [dev] QMP: {"return": {}} ```

Additional info

Additional context

holta commented 1 year ago

Similar problematic behavior is reasonably common, even with the most mainline starting of Ubuntu 22.04 instances (VM's) on an Ubuntu 22.04 host PC.

Repeatedly rebooting the host PC is the only solution I've found so far.

Painful.

Context: this again occurred with the Edge channel version of Multipass, in the last week.

holta commented 1 year ago

Similar problematic behavior is reasonably common, even with the most mainline starting of Ubuntu 22.04 instances (VM's) on an Ubuntu 22.04 host PC.

CLARIF:

The guest VM (Ubuntu 22.04 instance) was severely stuck in my case, having partially but not fully booted.

multipass info showed the insurance's internal IP address but no others. It was impossible to shell into the instance using multipass shell <instance name>

(In the end it took 2 full reboots of the host PC to finally get past the problem...)

EinBexiii commented 1 year ago

Have the same issue some solutions?

holta commented 1 year ago

If you're using LXD with Multipass, the problems would appear to remain serious:

fansch commented 1 year ago

Hi, I think I have the same problem. Firewall is not active.

Is there one more idea?

Thanks :-).

$ sw_vers
ProductName:        macOS
ProductVersion:     13.5.1
BuildVersion:       22G90
$ multipass version
multipass   1.12.2+mac
multipassd  1.12.2+mac
$ multipass list
Name                    State             IPv4             Image
primary                 Stopped           --               Ubuntu 22.04 LTS
$ multipass info --all
Name:           primary
State:          Stopped
IPv4:           --
Release:        --
Image hash:     9c59c6097711 (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         --
$ sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/Ubuntu-22.04-server-cloudimg-arm64.img
No errors were found on the image.
24470/81920 = 29.87% allocated, 84.56% fragmented, 82.13% compressed clusters
Image end offset: 732495872
$ multipass start -vvvv
[2023-08-29T22:38:13.598] [debug] [primary] process working dir ''
[2023-08-29T22:38:13.598] [info] [primary] process program 'qemu-system-aarch64'
[2023-08-29T22:38:13.598] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:d3:87:76, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-08-29T22:38:13.604] [debug] [qemu-system-aarch64] [5889] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.DrjiOg
[2023-08-29T22:38:13.665] [info] [primary] process state changed to Starting
[2023-08-29T22:38:13.667] [info] [primary] process state changed to Running
[2023-08-29T22:38:13.667] [debug] [qemu-system-aarch64] [5890] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:d3:87:76 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-08-29T22:38:13.667] [info] [primary] process started
start failed: The following errors occurred:
primary: timed out waiting for response
ricab commented 1 year ago

Hi @fansch, it looks like the VM is not getting an IP. Have you seen our troubleshooting docs? There might be something helpful for you there.

Otherwise, one thing that I have found sometimes helps in such cases is to reset DHCP leases:

  1. If there is nothing you don't mind loosing in the VM, just delete it with multipass delete -p primary
  2. Otherwise, we need to at least stop it. You can try multipass stop.
    1. That sometimes hangs when the VM is in unknown state. If that happens and the VM doesn't stop, you need to kill the QEMU process (pid 5890 in your case): kill -SIGKILL 5890. Unfortunately that may cause corruption to your VM, so deleting right away is preferable if you can.
  3. Follow this. If you have any leases that you don't want to loose, you can instead edit /var/db/dhcpd_leases and remove any entries pertaining to primary.
  4. restart the multipass daemon, for good measure:
    1. sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
    2. sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist
  5. Try to launch/start the VM again (depending on whether you deleted or stopped it)

Let us know how it goes for you!

ricab commented 1 year ago

We did some changes to improve this part of the starting sequence, in PR #3114, but they were not yet released.

fansch commented 1 year ago

Thanks for your message. I took a look ont the troubleshooting docs: firewall is disabled, there is no VPN, Little Snitch is not installed, Internet Sharing is disabled. About the bootpd DHCP server:

$ sudo lsof -iUDP:67 -n -P
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
launchd   1 root   20u  IPv4 0xff412dbbfaed0889      0t0  UDP *:67
launchd   1 root   33u  IPv4 0xff412dbbfaed0889      0t0  UDP *:67

Otherwise, one thing that I have found sometimes helps in such cases is to reset DHCP leases:

  1. If there is nothing you don't mind loosing in the VM, just delete it with multipass delete -p primary

I have deleted all my VM several times, even reinstalled multipass… It is installed via brew, but I have tested via the downloaded installer, and it is the same.

Otherwise, we need to at least stop it. You can try multipass stop.

  1. That sometimes hangs when the VM is in unknown state. If that happens and the VM doesn't stop, you need to kill the QEMU process (pid 5890 in your case): kill -SIGKILL 5890. Unfortunately that may cause corruption to your VM, so deleting right away is preferable if you can.

Effectively it helps stopping the VM… VM deleted after doing it.

  1. Follow this. If you have any leases that you don't want to loose, you can instead edit /var/db/dhcpd_leases and remove any entries pertaining to primary.

File emptied

  1. restart the multipass daemon, for good measure:

    1. sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
    2. sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist

Done

  1. Try to launch/start the VM again (depending on whether you deleted or stopped it) Let us know how it goes for you!
$ multipass list
No instances found.
$ multipass start -vvvvvv
[2023-09-04T00:29:20.998] [trace] [url downloader] Found https://codeload.github.com/canonical/multipass-blueprints/zip/refs/heads/main in cache: true
[2023-09-04T00:29:21.003] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-09-04T00:29:21.006] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-09-04T00:29:21.007] [debug] [blueprint provider] Loading "docker" v1
[2023-09-04T00:29:21.008] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "minikube" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-09-04T00:29:21.014] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-09-04T00:29:21.144] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/index.json in cache: false
[2023-09-04T00:29:21.187] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:download.json in cache: true
[2023-09-04T00:29:21.329] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/index.json in cache: false
[2023-09-04T00:29:21.331] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/com.ubuntu.cloud:daily:download.json in cache: true
[2023-09-04T00:29:21.333] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/index.json in cache: true
[2023-09-04T00:29:21.334] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/com.ubuntu.appliances:released:download.json in cache: true
[2023-09-04T00:29:21.334] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-09-04T00:29:21.337] [debug] [qemu-system-aarch64] [10680] started: qemu-system-aarch64 --version
[2023-09-04T00:29:21.420] [debug] [qemu-img] [10681] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.434] [debug] [qemu-img] [10682] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-04T00:29:21.442] [debug] [qemu-img] [10683] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.446] [debug] [primary] process working dir ''
[2023-09-04T00:29:21.446] [info] [primary] process program 'qemu-system-aarch64'
[2023-09-04T00:29:21.446] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-09-04T00:29:21.449] [debug] [qemu-system-aarch64] [10684] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.jnBXjP
[2023-09-04T00:29:21.486] [info] [primary] process state changed to Starting
[2023-09-04T00:29:21.488] [info] [primary] process state changed to Running
[2023-09-04T00:29:21.489] [debug] [qemu-system-aarch64] [10685] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-04T00:29:21.489] [info] [primary] process started
Starting primary |

And stops with the message: primary: timed out waiting for response

The log:

[2023-09-04T00:29:21.003] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-09-04T00:29:21.005] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-09-04T00:29:21.007] [debug] [blueprint provider] Loading "docker" v1
[2023-09-04T00:29:21.007] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "minikube" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-09-04T00:29:21.014] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-09-04T00:29:21.334] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-09-04T00:29:21.337] [debug] [qemu-system-aarch64] [10680] started: qemu-system-aarch64 --version
[2023-09-04T00:29:21.420] [debug] [qemu-img] [10681] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.434] [debug] [qemu-img] [10682] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-04T00:29:21.442] [debug] [qemu-img] [10683] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.446] [debug] [primary] process working dir ''
[2023-09-04T00:29:21.446] [info] [primary] process program 'qemu-system-aarch64'
[2023-09-04T00:29:21.446] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-09-04T00:29:21.449] [debug] [qemu-system-aarch64] [10684] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.jnBXjP
[2023-09-04T00:29:21.486] [info] [primary] process state changed to Starting
[2023-09-04T00:29:21.488] [info] [primary] process state changed to Running
[2023-09-04T00:29:21.488] [debug] [qemu-system-aarch64] [10685] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-04T00:29:21.489] [info] [primary] process started
[2023-09-04T00:29:21.489] [debug] [primary] Waiting for SSH to be up
[2023-09-04T00:29:21.636] [debug] [primary] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2023-09-04T00:29:21.741] [debug] [primary] QMP: {"return": {}}

[2023-09-04T00:29:37.367] [debug] [primary] QMP: {"timestamp": {"seconds": 1693780177, "microseconds": 367878}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}

[2023-09-04T00:30:21.840] [debug] [primary] QMP: {"timestamp": {"seconds": 1693780221, "microseconds": 840743}, "event": "RTC_CHANGE", "data": {"offset": 1, "qom-path": "/machine/unattached/device[4]"}}
ricab commented 1 year ago

Hmm, someone reports the same issue in here. I wonder if there is something to that security update after all.

@fansch would you be able to try with this package and paste the output of multipass launch -vvvv?

fansch commented 1 year ago

Sorry, but the link opens a 404.

ricab commented 1 year ago

Sorry, my bad. I have updated the link.

fansch commented 1 year ago

Hi, Here is the result. If you need more information, tell me.

$ multipass version
multipass   1.13.0-dev.370.ci5835+ge20ef2f1.mac
multipassd  1.13.0-dev.370.ci5835+ge20ef2f1.mac
$ multipass list
No instances found.
$ multipass start -vvvvvv
[2023-09-05T08:09:31.024] [debug] [qemu-system-aarch64] [61764] started: qemu-system-aarch64 --version
[2023-09-05T08:09:31.046] [debug] [qemu-img] [61765] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T08:09:31.061] [debug] [qemu-img] [61766] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-05T08:09:31.071] [debug] [qemu-img] [61767] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T08:09:31.076] [debug] [primary] process working dir ''
[2023-09-05T08:09:31.076] [info] [primary] process program 'qemu-system-aarch64'
[2023-09-05T08:09:31.076] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:0f:5d:ec, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-09-05T08:09:31.079] [debug] [qemu-system-aarch64] [61768] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.abMOMe
[2023-09-05T08:09:31.117] [info] [primary] process state changed to Starting
[2023-09-05T08:09:31.119] [info] [primary] process state changed to Running
[2023-09-05T08:09:31.119] [debug] [qemu-system-aarch64] [61769] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:0f:5d:ec -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-05T08:09:31.119] [info] [primary] process started
[2023-09-05T08:09:31.120] [debug] [primary] Waiting for SSH to be up
launch failed: The following errors occurred:
primary: timed out waiting for response
sudo tail -f multipassd.log
[2023-09-05T08:09:31.119] [info] [primary] process state changed to Running
[2023-09-05T08:09:31.119] [debug] [qemu-system-aarch64] [61769] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:0f:5d:ec -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-05T08:09:31.119] [info] [primary] process started
[2023-09-05T08:09:31.120] [debug] [primary] Waiting for SSH to be up
[2023-09-05T08:09:31.158] [debug] [primary] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2023-09-05T08:09:31.175] [debug] [primary] QMP: {"return": {}}

[2023-09-05T08:09:44.141] [debug] [primary] QMP: {"timestamp": {"seconds": 1693894184, "microseconds": 141376}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}
ricab commented 1 year ago

Interesting, no trace after that "waiting for SSH" message. That excludes some possible scenarios :thinking: I guess we still need to add more trace logging around and within this portion of code to try to figure this out once and for all.

In the meantime, @fansch, do you see an entry for primary in /var/db/dhcpd_leases after that timeout? Are you able to launch another instance, with a random name?

fansch commented 1 year ago
$ multipass launch -vvvv
[2023-09-05T13:51:05.286] [trace] [url downloader] Found https://codeload.github.com/canonical/multipass-blueprints/zip/refs/heads/main in cache: true
[2023-09-05T13:51:05.295] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-09-05T13:51:05.296] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-09-05T13:51:05.297] [debug] [blueprint provider] Loading "docker" v1
[2023-09-05T13:51:05.297] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-09-05T13:51:05.298] [debug] [blueprint provider] Loading "minikube" v1
[2023-09-05T13:51:05.299] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-09-05T13:51:05.299] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-09-05T13:51:05.546] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/index.json in cache: false
[2023-09-05T13:51:05.576] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:download.json in cache: true
[2023-09-05T13:51:05.754] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/index.json in cache: false
[2023-09-05T13:51:05.910] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/com.ubuntu.cloud:daily:download.json in cache: false
[2023-09-05T13:51:05.924] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/index.json in cache: true
[2023-09-05T13:51:05.925] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/com.ubuntu.appliances:released:download.json in cache: true
[2023-09-05T13:51:05.926] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-09-05T13:51:05.934] [debug] [qemu-system-aarch64] [63541] started: qemu-system-aarch64 --version
[2023-09-05T13:51:06.114] [debug] [qemu-img] [63542] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T13:51:06.136] [debug] [qemu-img] [63543] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-05T13:51:06.155] [debug] [qemu-img] [63544] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T13:51:06.161] [debug] [dazzled-cub] process working dir ''
[2023-09-05T13:51:06.161] [info] [dazzled-cub] process program 'qemu-system-aarch64'
[2023-09-05T13:51:06.161] [info] [dazzled-cub] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:bf:db:60, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/cloud-init-config.iso'
[2023-09-05T13:51:06.164] [debug] [qemu-system-aarch64] [63545] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.dSqTxg
[2023-09-05T13:51:06.201] [info] [dazzled-cub] process state changed to Starting
[2023-09-05T13:51:06.204] [info] [dazzled-cub] process state changed to Running
[2023-09-05T13:51:06.204] [debug] [qemu-system-aarch64] [63546] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:bf:db:60 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/cloud-init-config.iso
[2023-09-05T13:51:06.204] [info] [dazzled-cub] process started
launch failed: The following errors occurred:
dazzled-cub: timed out waiting for response

About the file /var/db/dhcpd_leases it does not exist.

So I went back to the troubleshoot page. And here are the DHCP things:

$ sudo lsof -iUDP:67 -n -P
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
launchd   1 root   20u  IPv4 0xff412dbbfaed7089      0t0  UDP *:67
launchd   1 root   33u  IPv4 0xff412dbbfaed7089      0t0  UDP *:67
$ sudo launchctl load -w /System/Library/LaunchDaemons/bootps.plist
/System/Library/LaunchDaemons/bootps.plist: service already loaded
Load failed: 17: File exists
$ sudo launchctl unload /System/Library/LaunchDaemons/bootps.plist
$ sudo launchctl load -w /System/Library/LaunchDaemons/bootps.plist
$ sudo ls -ald /var/db/dhcp*
drwx------  6 root  wheel  192 Sep  4 11:36 dhcpclient

And then again multipass launch -vvvv but no dhcpd_leases file created.

ricab commented 1 year ago

I don't know, for some reason it doesn't want to hand out IPs. Since you don't have the firewall active, I don't know what could be messing it up. Have you tried just rebooting? How about toggling internet sharing? No VPN either? No Little Snitch?

fansch commented 1 year ago

Many many thanks for all the time taken. In between, I have a good news: it works, event with firewall activated. I cannot tell you what exactly helped, but here are some infos. Simple rebooting did not help. Not really surprized, as I am used to shut down the laptop while going myself to sleep :-). I have then made a full cleanup with the free software OnyX: it cleans all the system caches, cleans the ARP caches, and so on. A well known tool. And since everything is OK. I have much more lines with this:

$ sudo lsof -iUDP:67 -n -P
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
launchd    1 root   48u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
launchd    1 root   51u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
launchd    1 root   52u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
launchd    1 root   53u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
bootpd  6341 root    0u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
bootpd  6341 root    1u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
bootpd  6341 root    2u  IPv4 0x85ae9608287143c1      0t0  UDP *:67

If you need still more info in the before/after. Again many thanks.

ricab commented 1 year ago

Hey, great to know you got it working @fansch. Thanks for the tip regarding Onyx.

percygrunwald commented 1 year ago

Wow, surprisingly OnyX worked for me... I was having the same problem with instances hanging on starting in the CLI and showing Unknown.

OnyX > Maintenance, with the following options (the defaults for me when I opened the OnyX appfor the first time).

Verifying
Structure of the file system [x]
Delete APFS snapshots [x]

Rebuilding
Launch Services database [x]

Cleaning
System [x]
Applications [x]
Internet [x]
Other [x]

OnyX automatically reboots the system. After rebooting the system following the OnyX run, I was able to launch instances successfully (random name bionic, random name focal, static name bionic with --cpus and --mount).

OnyX: 4.4.4 for macOS Ventura 13
Multipass(d): 1.12.2+mac (installed from downloaded .dmg)
Mac:
  Model Name:   MacBook Pro
  Model Identifier: Mac14,10
  Model Number: MNW83LL/A
  Chip: Apple M2 Pro
  System Version:   macOS 13.6 (22G120)
  Kernel Version:   Darwin 22.6.0
  Boot Volume:  Macintosh HD

Not exactly sure what OnyX changed/removed that got this working. Hope this helps users struggling with this issue or the devs to find what exactly is blocking the start.

nnarhinen commented 1 year ago

Not sure what happeed afer upgrading but now I'm experiencing this on all my old instances..

current version:

multipass   1.12.2+mac
multipassd  1.12.2+mac
% multipass -vvvv start codesense
[2023-10-05T14:40:43.565] [debug] [codesense] process working dir ''
[2023-10-05T14:40:43.566] [info] [codesense] process program 'qemu-system-aarch64'
[2023-10-05T14:40:43.566] [info] [codesense] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:cc:4e:25, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 8192M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/cloud-init-config.iso'
[2023-10-05T14:40:43.594] [debug] [qemu-system-aarch64] [13670] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.XUiGUT
[2023-10-05T14:40:44.273] [info] [codesense] process state changed to Starting
[2023-10-05T14:40:44.276] [info] [codesense] process state changed to Running
[2023-10-05T14:40:44.276] [debug] [qemu-system-aarch64] [13671] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:cc:4e:25 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/cloud-init-config.iso
[2023-10-05T14:40:44.276] [info] [codesense] process started
start failed: The following errors occurred:
codesense: timed out waiting for response
upngo commented 1 year ago

Am getting this issue on sonoma (and had on ventura as well) on multipass 1.13.0-dev.298 (has the increased ssh time outs) and on 1.12.2

I periodically get this issue and go through countless variations of deleting vms, reloading daemons, reinstalling multipass, deleting dhcpd_leases and restarting my computer before it decides it's happy again...

Even with firewall disabled it's always the same issue where it's waiting for ssh to be up and the last line has NIC_RX_FILTER_CHANGED then eventually times out..

Not properly closing your VM before turning your computer off (or running out of power) is a good way to break all your VMs (and can't create new ones either).

Here's one of my current outputs at trying to launch a new box

[2023-10-16T14:44:21.028] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-10-16T14:44:21.029] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-10-16T14:44:21.030] [debug] [blueprint provider] Loading "docker" v1
[2023-10-16T14:44:21.031] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-10-16T14:44:21.031] [debug] [blueprint provider] Loading "minikube" v1
[2023-10-16T14:44:21.032] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-10-16T14:44:21.032] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-10-16T14:44:23.728] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-10-16T14:44:23.734] [debug] [qemu-system-aarch64] [1570] started: qemu-system-aarch64 --version
[2023-10-16T14:44:23.762] [debug] [qemu-img] [1571] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20231010/ubuntu-22.04-server-cloudimg-arm64.img
[2023-10-16T14:44:23.777] [debug] [qemu-img] [1572] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-10-16T14:44:23.786] [debug] [qemu-img] [1573] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img
[2023-10-16T14:44:23.791] [debug] [promoting-bedbug] process working dir ''
[2023-10-16T14:44:23.791] [info] [promoting-bedbug] process program 'qemu-system-aarch64'
[2023-10-16T14:44:23.791] [info] [promoting-bedbug] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:96:b6:b3, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/cloud-init-config.iso'
[2023-10-16T14:44:23.792] [debug] [qemu-system-aarch64] [1574] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.IDUJOh
[2023-10-16T14:44:23.823] [info] [promoting-bedbug] process state changed to Starting
[2023-10-16T14:44:23.824] [info] [promoting-bedbug] process state changed to Running
[2023-10-16T14:44:23.824] [debug] [qemu-system-aarch64] [1575] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:96:b6:b3 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/cloud-init-config.iso
[2023-10-16T14:44:23.824] [info] [promoting-bedbug] process started
[2023-10-16T14:44:23.825] [debug] [promoting-bedbug] Waiting for SSH to be up
[2023-10-16T14:44:23.854] [debug] [promoting-bedbug] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2023-10-16T14:44:23.870] [debug] [promoting-bedbug] QMP: {"return": {}}

[2023-10-16T14:44:37.065] [debug] [promoting-bedbug] QMP: {"timestamp": {"seconds": 1697420677, "microseconds": 65787}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}
upngo commented 1 year ago

After a number of reinstalls and restarts I got a new setup working.

percygrunwald commented 11 months ago

My VM got into this same stuck state again and this process worked for me this time (might have been luck too):

  1. Stop and delete/purge all Multipass VMs
  2. rm /var/db/dhcpd_leases
  3. Reboot

Following the reboot the VM came up fine.

bschick commented 9 months ago

Think I'm seeing a similar issue. I have a number of VMs that were created > 2 months ago and none will fully start. Symtoms are similar to previous comments...

% multipass start -vvvvvv angular [2024-01-10T20:53:29.681] [debug] [angular] process working dir '' [2024-01-10T20:53:29.682] [info] [angular] process program 'qemu-system-aarch64' [2024-01-10T20:53:29.682] [info] [angular] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 2, -m, 2048M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso' [2024-01-10T20:53:29.693] [debug] [qemu-system-aarch64] [8377] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.wzWGig [2024-01-10T20:53:29.752] [info] [angular] process state changed to Starting [2024-01-10T20:53:29.754] [info] [angular] process state changed to Running [2024-01-10T20:53:29.755] [debug] [qemu-system-aarch64] [8378] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 2048M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso [2024-01-10T20:53:29.755] [info] [angular] process started Starting angular start failed: The following errors occurred: angular: timed out waiting for response

The qemu process starts, but an IP address is never assigned and the command times out waiting for ssh. Multipass log ends with:

[2024-01-10T21:00:35.152] [debug] [fervent-euglena] Waiting for SSH to be up [2024-01-10T21:00:35.199] [debug] [fervent-euglena] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}^M

[2024-01-10T21:00:35.220] [debug] [fervent-euglena] QMP: {"return": {}}^M

I noticed that machine_type for all was virt-7.1 until I tried to start running them today. Now they have virt-8.0.

I've tried to uninstall multipass and reinstall from the package and and then from homebrew. Tried Onyx maintenance and deleting /var/db/dhcpd_leases and restarting (no IP address shows up in the lease file on a failed start btw).

Also I am able to launch new VM just fine. Its just starting VMs that have existed for > 2 months that doesn't start.

% multipass version multipass 1.13.0+mac multipassd 1.13.0+mac

Apple M1 Pro macOS 14.2.1 (23C71

ricab commented 9 months ago

Hi @bschick, do you see an even with NIC_RX_FILTER_CHANGED in the logs? I don't see it in the lines you pasted, so your VMs may not even be getting to the point of requesting an IP.

Were they in a suspended state before you tried to start them? If so, does it help if you run sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img snapshot -d suspend "/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img" and try to start again?

bschick commented 9 months ago

No NIC_RX_FILTER_CHANGED present in the logs and /var/db/dhcpd_leases is never created.

Important detail I just noticed (a bit late). I had 3 VMs defined on my system. Two of them were left running from > 2 months back. From then until now I've applied OS patches, suspended and restarted my laptop etc. When I looked at their state this week the 'angular' VM state was "stopped" and the other one was "suspended". Neither of them will start. The 3rd VM, which was gracefully stopped >2 months ago now starts just fine. So the problem could be caused by some non-graceful termination or OS patch with running VMs

I tried the "-d suspend" command you provide on the VM that was previously listed as suspended and the output was "qemu-img: Could not delete snapshot 'suspend': snapshot not found". Likely since I stopped the VM and killed the qemu processes already.

ricab commented 9 months ago

Hi @bschick, OK, so you got rid of the suspension state but the VM didn't start. Could you try to run the following command after making sure the corresponding VM is stopped in Multipass?

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application\ Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 2048M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -cdrom /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso

Do you see a window popping up? Does that window eventually get to a login prompt?

bschick commented 9 months ago

Thanks for continuing to followup. Verified VM was in stopped state. Your command launched qemu-system-aarch64 with a terminal windows that contains one line that says "parallel0 console" and a flashing cursor. Nothing else

The window where I ran the command has:

{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}} 2024-01-13 21:27:01.784 qemu-system-aarch64[49110:2414358] This process won't be able to run remote lookup service

Waited 30 minutes and both seem to be stuck. No login prompt.

ricab commented 9 months ago

Hi @bschick, I was afraid of something like that. It looks like the image is messed up and the VM can no longer boot :frowning_face: Did you by any chance upgrade directly from 1.11 to 1.13? Or were those VMs last used in 1.11 when you tried them in 1.13?

Where did say that you saw virt-8.0 instead of virt-7.1 again? Does it help if you run the same command but with -machine virt-7.1?

QEMU was updated to 8.0 in Multipass 1.12. We found then that some users had trouble resuming from suspension snapshots created earlier. Some people were able to get past that by deleting the suspension snapshot, which is what I suggested. Unfortunately, that wasn't enough for your case. The way things are today, when suspending VMs, Multipass discards the running state of VMs after capturing the suspension snapshot, which may leave them in an corrupted or inconsistent state if they are not resumed from the snapshot. This could be what's happening to you. We're now considering a different mechanism, that would attempt to shutdown the VM in the background without making the user wait for that.

I am not very hopeful, but one other thing you could try is sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all "/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img", then try the command above again. Let me know what you find!

bschick commented 9 months ago

Hi @bschick, I was afraid of something like that. It looks like the image is messed up and the VM can no longer boot ☹️ Did you by any chance upgrade directly from 1.11 to 1.13? Or were those VMs last used in 1.11 when you tried them in 1.13?

I hit the problems while running 1.12 and upgrade to 1.13 hoping it might solve the problem.

Where did say that you saw virt-8.0 instead of virt-7.1 again? Does it help if you run the same command but with -machine virt-7.1?

In the value for "machine_type" in /var/root/Library/Application\ Support/multipassd/qemu/multipassd-vm-instances.json. I noticed that changed from 7.1 to 8.0 when I tried to start them with 1.12 Running the qemu-system-aarch64 command with that flag didn't change anything.

QEMU was updated to 8.0 in Multipass 1.12. We found then that some users had trouble resuming from suspension snapshots created earlier. Some people were able to get past that by deleting the suspension snapshot, which is what I suggested. Unfortunately, that wasn't enough for your case. The way things are today, when suspending VMs, Multipass discards the running state of VMs after capturing the suspension snapshot, which may leave them in an corrupted or inconsistent state if they are not resumed from the snapshot. This could be what's happening to you. We're now considering a different mechanism, that would attempt to shutdown the VM in the background without making the user wait for that.

Keep in mind that I have multiple VMs with this problem. I only saw one of then reported to be suspended, but its possible they were both suspended at one point and that multipass reset state during a failed start (and I just never noticed the suspended state).

I am not very hopeful, but one other thing you could try is sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all "/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img", then try the command above again. Let me know what you find!

I did an image check while troubleshooting this on my own. It did find many things to correct but exited without an error. When I run it now, all looks fine. Trying the qemu-system-aarch64 command again results in the same hang.

No errors were found on the image.
117316/163840 = 71.60% allocated, 17.82% fragmented, 5.24% compressed clusters
Image end offset: 8930787328
ricab commented 9 months ago

Hmm, I see @bschick. Were you ever able to run the VMs that now fail in 1.12, or was it after upgrading from 1.11 to 1.12 that those problems started?

ricab commented 9 months ago

Hi @bschick, it looks like that parallel0 thing is a red herring. On my x86 mac, the analogous to the command line I gave you brings up a VGA screen that eventually gets to the login console. However, someone with an M1 tells me they don't see that, so it either doesn't open or isn't in focus by default with qemu-system-aarch64 for some reason. Could you try the following command instead?

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application\ Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 2048M -qmp stdio -cdrom /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso

Do you now see a boot sequence that eventually gets to a login prompt? Otherwise, please try to select "serial0" from the "View" menu. Let me know what you find!

bschick commented 9 months ago

Hmm, I see @bschick. Were you ever able to run the VMs that now fail in 1.12, or was it after upgrading from 1.11 to 1.12 that those problems started?

Were working then tried under 1.12 months later and they never started successfully again. Have not tried to revert to 1.11

Do you now see a boot sequence that eventually gets to a login prompt? Otherwise, please try to select "serial0" from the "View" menu. Let me know what you find!

Yes, looking at serial0 shows the boot sequence. Looks like the idea of disk corruption is correct. Haven't yet figure out how to dump the output text so took screenshots.

Stage 1: Something flashed by that is too fast to capture

Stage 2: See attached "stage2" image

stage2

Stage 3: See attached "stage3" image (doesn't get past Btrfs loaded)

stage3
ricab commented 9 months ago

Hmm, OK. Looking at a successful instance, the ext4 filesystem in sda1 is mounted shortly after that last btrfs line. What about the other VMs, do you see them failing in the exact same way @bschick?

QEMU might be unable to mount the filesystem because it is corrupted, but I wouldn't expect multiple images to get corrupted the same way. They could be corrupted in different ways that all prevent the filesystem from being mounted, but still... It feels like a systemic problem. Have you tried using virt-7.1 now that you have the correct command line? If that doesn't work, can you please try reverting to 1.11?

bschick commented 8 months ago

Yes, I tried with -machine virt-7.1,gic-version=3 and it did not make a difference. The other VM now causes multipassd to hang so I've removed it.

But I now have a newer VM created in the past week that is in the same state. It was working fine, then this morning I started getting EINVAL disk read errors, now when I try to restart the VM is hangs right after the "Btrfs loaded" line as the previous VM.

Anything you'd like me to check before I revert to 1.11?

bschick commented 8 months ago

I reverted to 1.11.1 and it cannot start any of the broken VMs either. My suspicious is that the VMs end up in a bad state when left running and my laptop sleeps. I don't remember that happening in the past so perhaps 1.12 or 1.13 changed something to make that more likely or perhaps something has changed on my laptop.

I will create yet another new VM and see if it ends up corrupted while running in version 1.11.1.

ricab commented 8 months ago

Hi @bschick, thanks for sticking with us.

But I now have a newer VM created in the past week that is in the same state. It was working fine, then this morning I started getting EINVAL disk read errors, now when I try to restart the VM is hangs right after the "Btrfs loaded" line as the previous VM.

Intriguing. I don't suppose you're running out of disk space?

I will create yet another new VM and see if it ends up corrupted while running in version 1.11.1.

OK, that would be useful to know. If you could identify when a VM stops working and send us the log, that would be great too.

bschick commented 8 months ago

Intriguing. I don't suppose you're running out of disk space?

Not out of disk space.

I will create yet another new VM and see if it ends up corrupted while running in version 1.11.1.

OK, that would be useful to know. If you could identify when a VM stops working and send us the log, that would be great too.

No problems so far under 1.11, but last time it took a week. If another VM does get into a bad state, which logs do you want? /Library/Logs/Multipass didn't have much info last time.

Would it help you to have the most recent corrupted VM to test locally? The only confidential data on it is a private key used for git signing that I could replace on github.

ricab commented 8 months ago

Hi @bschick

No problems so far under 1.11, but last time it took a week. If another VM does get into a bad state, which logs do you want? /Library/Logs/Multipass didn't have much info last time.

There should be a multipassd.log in there. That's the one I'm after.

Would it help you to have the most recent corrupted VM to test locally? The only confidential data on it is a private key used for git signing that I could replace on github.

No need for the time being. Let us have a look at logs first. If that is still inconclusive, then we may ask for an image. Thanks!