canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.69k stars 637 forks source link

After upgrade, one instance fails to start #3600

Open PrayanSen opened 1 month ago

PrayanSen commented 1 month ago

Describe the bug

I upgraded multipass to 1.14.0-mac to make use of the --force option in multipass stop. I use a Macbook with MacOS Monterrey 12.6. I have a total of 5 instances and one of them had hung and was unreachable so I had to use multipass stop --force to stop it. Now when I try to start it, I get this error: start failed: The following errors occurred: qemu-system-aarch64: -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/Thesis/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda: qcow2: Image is corrupt; cannot be opened read/write Thesis: shutdown called while starting

I tried to cd into the directory to maybe inspect it myself, but I get this error: Library/Application Support/multipassd: No such file or directory ( I tried with sudo su as the root user as well)

My other instances seem to work fine, I tested 2 of the remaining and they powered on successfully.

Expected behavior I want to know my opinions.. Recovery is the ideal case but if that doesn't work, unfortunately the instance I have a problem with has important data that I do not want to delete by deleting or removing the instance and making a new one. If I uninstall and reinstall multipass for example, do I lose the data?

Logs I do not see the Multipass directory in the specified location: Library/Logs % ls Assistant Microsoft Teams Helper (Renderer) Baseband SMSMigrator DiagnosticReports ZoomPhone DiscRecording.log com.apple.AMPLibraryAgent Homebrew zoom.us JetBrains

I cannot seem to find the logs, I can provide them if I can get any guidance here..

Name: CBDP State: Stopped Snapshots: 0 IPv4: -- Release: -- Image hash: 5167c1b13cb3 (Ubuntu 22.04 LTS) CPU(s): -- Load: -- Disk usage: -- Memory usage: -- Mounts: --

Name: GR State: Stopped Snapshots: 0 IPv4: -- Release: -- Image hash: 5167c1b13cb3 (Ubuntu 22.04 LTS) CPU(s): -- Load: -- Disk usage: -- Memory usage: -- Mounts: --

Name: Praktikum State: Stopped Snapshots: 0 IPv4: -- Release: -- Image hash: 5167c1b13cb3 (Ubuntu 22.04 LTS) CPU(s): -- Load: -- Disk usage: -- Memory usage: -- Mounts: --

Name: PratikUbuntu State: Deleted Snapshots: 0 IPv4: -- Release: -- Image hash: 4fd777cb9295 (Ubuntu 22.04 LTS) CPU(s): -- Load: -- Disk usage: -- Memory usage: -- Mounts: --

Name: TUMThesis State: Stopped Snapshots: 0 IPv4: -- Release: -- Image hash: f6bf7305207a (Ubuntu 22.04 LTS) CPU(s): -- Load: -- Disk usage: -- Memory usage: -- Mounts: --

Name: Thesis State: Deleted Snapshots: 0 IPv4: -- Release: -- Image hash: c841bac00925 (Ubuntu 24.04 LTS) CPU(s): -- Load: -- Disk usage: -- Memory usage: -- Mounts: --

sharder996 commented 1 month ago

Hi @PrayanSen,

It looks like one of your instances was corrupted when Multipass suspended it and you upgraded Multipass.

The reason you can't see directory is because the directory is owned by root. You can navigate to it by entering a root shell; sudo -s. That being said, in order to recover the image the best one can do is:

$ sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
$ sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/Thesis/ubuntu-24.04-server-cloudimg-arm64.img
$ sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist

If you can't find the image directory, just dump the contents of the parent directory; ls -al /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/.

PrayanSen commented 1 month ago

Hi @sharder996 ,

Thank you for your response, I did try entering the root shell and locating the directory, but it seems it cannot find it:

(base) root@Pratiks-MBP ~ # sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
(base) root@Pratiks-MBP ~ # sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all var/root/Library/Application\ Support/multipassd/qemu/vault/instances/TUMhesis/ubuntu-24.04-server-cloudimg-arm64.img
qemu-img: Could not open 'var/root/Library/Application Support/multipassd/qemu/vault/instances/TUMhesis/ubuntu-24.04-server-cloudimg-arm64.img': Could not open 'var/root/Library/Application Support/multipassd/qemu/vault/instances/Thesis/ubuntu-24.04-server-cloudimg-arm64.img': No such file or directory
(base) root@Pratiks-MBP ~ # sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist
(base) root@Pratiks-MBP ~ # ls -al var/root/Library/Application\ Support/multipassd/qemu/vault/instances/
ls: var/root/Library/Application Support/multipassd/qemu/vault/instances/: No such file or directory

In the directory Library/Application support, I did a ls, if that helps:

Application Support # ls -altrh | grep multi*
grep: multipass-gui: Is a directory

Please let me know what I am missing here 😞

sharder996 commented 1 month ago

You are missing a leading slash in the directory path

(base) root@Pratiks-MBP ~ # ls -al var/root/Library/Application\ Support/multipassd/qemu/vault/instances/

should be

(base) root@Pratiks-MBP ~ # ls -al /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/

I made the same mistake in my above comment. Try checking the image again with qemu-img, but with the leading slash in the path.

PrayanSen commented 1 month ago

Thanks a lot for pointing that out, you were right. I now did the check -r command and it worked I hope:

 1603560 leaked clusters
    1476555 corruptions

Double checking the fixed image now...
No errors were found on the image.
1482709/3276800 = 45.25% allocated, 2.62% fragmented, 0.41% compressed clusters
Image end offset: 107388928000

I did the launchctl unload before and launchctl load after , but multipass now seems to have hung. The GUI is stuck at 'Waiting for daemon'. If I do unload and load, it works for some time, but it gets stuck at 'Waiting for daemon' again. I guess I must have messed something up.. Is there anyway to force restart the service, because Quitting and retrying it does not work?

sharder996 commented 1 month ago

Can you supply the logs from around the time of unloading and loading the multipass service?

Also, try just restarting your host machine. The GUI reporting "Waiting for daemon" does not necessarily mean that the daemon is hanging. Could be possible that it's not even running. On my own mac, I've noticed that sometimes the Multipass daemon does not start up as expected when using launchctl.

PrayanSen commented 1 month ago

While inspecting the logs, I could see another instance had hung and I force stopped it. Now multipass daemon seems to be stable (the GUI does not keep crashing). But when I try to start my instance (the one that had the original issue and for whom I had run the check -r command), the process times out. The error on the GUI: 'TUMThesis: timed out waiting for response'

Here are the logs for the event:

[2024-07-24T23:26:14.449] [debug] [TUMThesis] process working dir ''
[2024-07-24T23:26:14.449] [info] [TUMThesis] process program 'qemu-system-aarch64'
[2024-07-24T23:26:14.450] [info] [TUMThesis] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:ee:ff:74, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/TUMThesis/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 8, -m, 16384M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/TUMThesis/cloud-init-config.iso'
[2024-07-24T23:26:14.455] [debug] [qemu-system-aarch64] [1173] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.EnGdtO
[2024-07-24T23:26:14.642] [info] [TUMThesis] process state changed to Starting
[2024-07-24T23:26:14.644] [info] [TUMThesis] process state changed to Running
[2024-07-24T23:26:14.644] [debug] [qemu-system-aarch64] [1174] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:ee:ff:74 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/TUMThesis/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 8 -m 16384M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/TUMThesis/cloud-init-config.iso
[2024-07-24T23:26:14.644] [info] [TUMThesis] process started
[2024-07-24T23:26:14.645] [debug] [TUMThesis] Waiting for SSH to be up
[2024-07-24T23:26:15.019] [debug] [TUMThesis] QMP: {"QMP": {"version": {"qemu": {"micro": 1, "minor": 2, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2024-07-24T23:26:15.144] [debug] [TUMThesis] QMP: {"return": {}}

[2024-07-24T23:29:08.101] [debug] [daemon] Returning setting local.TUMThesis.memory=16.0GiB
[2024-07-24T23:29:08.101] [debug] [daemon] Returning setting local.bridged-network=
[2024-07-24T23:29:08.109] [debug] [ifconfig] [1200] started: ifconfig 
[2024-07-24T23:29:08.123] [debug] [networksetup] [1201] started: networksetup -listallhardwareports
[2024-07-24T23:29:08.162] [debug] [daemon] Returning setting local.TUMThesis.disk=200.0GiB
[2024-07-24T23:29:08.163] [debug] [daemon] Returning setting local.driver=qemu
[2024-07-24T23:29:08.163] [debug] [daemon] Returning setting local.TUMThesis.cpus=8
[2024-07-24T23:29:08.188] [debug] [ifconfig] [1203] started: ifconfig 
[2024-07-24T23:29:08.193] [debug] [networksetup] [1204] started: networksetup -listallhardwareports
[2024-07-24T23:38:51.189] [debug] [async task] fetch manifest periodically
[2024-07-24T23:38:51.273] [warning] [Qt] Execution of PAC script at "http://wpad/wpad.dat" failed: The operation couldn<E2><80><<99>t be completed. (kCFErrorDomainCFNetwork error 308.)
[2024-07-24T23:38:51.274] [warning] [Qt] Execution of PAC script at "http://wpad/wpad.dat" failed: The operation couldn<E2><80><<99>t be completed. (kCFErrorDomainCFNetwork error 308.)
[2024-07-24T23:38:51.275] [warning] [Qt] Execution of PAC script at "http://wpad/wpad.dat" failed: The operation couldn<E2><80><<99>t be completed. (kCFErrorDomainCFNetwork error 308.)
[2024-07-24T23:38:51.276] [warning] [Qt] Execution of PAC script at "http://wpad/wpad.dat" failed: The operation couldn<E2><80><<99>t be completed. (kCFErrorDomainCFNetwork error 308.)
[2024-07-24T23:38:51.277] [info] [VMImageHost] Did not find any supported products in "appliance"
[2024-07-24T23:38:51.480] [warning] [Qt] Execution of PAC script at "http://wpad/wpad.dat" failed: The operation couldn<E2><80><<99>t be completed. (kCFErrorDomainCFNetwork error 308.)
[2024-07-24T23:38:51.484] [warning] [Qt] Execution of PAC script at "http://wpad/wpad.dat" failed: The operation couldn<E2><80><<99>t be completed. (kCFErrorDomainCFNetwork error 308.)

The multipass list looks like:

(base) pratik@Pratiks-MBP Multipass % multipass list
Name                    State             IPv4             Image
primary                 Stopped           --               Ubuntu 22.04 LTS
CBDP                    Stopped           --               Ubuntu 22.04 LTS
GR                      Stopped           --               Ubuntu 22.04 LTS
Praktikum               Stopped           --               Ubuntu 22.04 LTS
PratikUbuntu            Deleted           --               Ubuntu 22.04 LTS
TUMThesis               Unknown           --               Ubuntu 22.04 LTS --> this is the VM that had the issue
Thesis                  Deleted           --               Ubuntu 24.04 LTS
georgeliao commented 1 month ago

@PrayanSen It looks like the instance is corrupted and qemu-img check -r could not repair that. qemu-img check -r all maybe is something you can try. If it does not work, the last resort can be mounting the image and retrieving the data.