Open ricab opened 1 year ago
Hey @ricab,
You say a 10.15 macOS VM, so you mean like one we have on our Mini and not directly on the Mac host itself? Also, I wonder if this can just be closed since we are dropping 10.15 support in the upcoming 1.13 release???
Hmm, I believe this happened when I was working on that dedicated testflinger VM. I don't know whether the macOS version makes a difference. There could have been some sort of race that manifested because things were slower. Idk...
I was suffering this issue on MacOS 13.5 (Ventura). I am running Multipass 1.12.2+mac
.
TL;DR : if your VM is out of disk space, that could be why.
==> /Library/Logs/Multipass/multipassd.log <==
[2024-01-12T06:03:30.648] [debug] [primary] QMP: {"return": {}}
[2024-01-12T06:03:30.650] [debug] [primary] QMP: {"timestamp": {"seconds": 1705057410, "microseconds": 647553}, "event": "POWERDOWN"}
[2024-01-12T06:03:30.650] [info] [primary] VM powering down
[2024-01-12T06:09:50.650] [debug] [base_vm] Error getting extra IP addresses: ssh connection failed: 'Timeout connecting to 192.168.64.6'
[2024-01-12T06:10:00.903] [info] [daemon] Cannot open ssh session on "primary" shutdown: ssh connection failed: 'Failed to connect: Host is down'
[2024-01-12T06:10:00.904] [debug] [primary] QMP: {"return": {}}
{"timestamp": {"seconds": 1705057800, "microseconds": 903987}, "event": "POWERDOWN"}
...then I tried to use the UI icon menu to exit multipass, but that only seems to exit the UI - I still have:
$ multipass list
Name State IPv4 Image
primary Running 192.168.64.6 Ubuntu 20.04 LTS
ubuntu-2004 Stopped -- Ubuntu 20.04 LTS
ubuntu-lts Stopped -- Ubuntu 20.04 LTS
Then I tried to shutdown the instance from within:
$ multipass shell primary
ubuntu@primary:~$ sudo -sH
root@primary:/home/ubuntu# sync;halt
(back out to the host shell)
$ multipass list
Name State IPv4 Image
primary Running 192.168.64.6 Ubuntu 20.04 LTS
ubuntu-2004 Stopped -- Ubuntu 20.04 LTS
ubuntu-lts Stopped -- Ubuntu 20.04 LTS
Still running! ...and then tried to get back in:
$ multipass shell primary
shell failed: ssh connection failed: 'Connection refused'
So halt
just killed the instances' sshd
without fully shutting down the instance.
So then I tried to stop/start the multipass daemon in a separate terminal:
sudo launchctl stop com.canonical.multipassd
...this caused the hanging multipass stop primary
to output:
$ multipass stop primary
stop failed: cannot connect to the multipass socket
...then I restarted the service daemon:
sudo launchctl start com.canonical.multipassd
Then I checked the state:
$ multipass list
Name State IPv4 Image
primary Running 192.168.64.6 Ubuntu 20.04 LTS
ubuntu-2004 Stopped -- Ubuntu 20.04 LTS
ubuntu-lts Stopped -- Ubuntu 20.04 LTS
Then I tried rebooting my Mac laptop - but that instance IS STILL RUNNING!! See: mp_reboot_log.txt
Finally I tried to see if I could find the VM's process and kill it:
# ps -ef | grep -i qemu
0 478 323 0 6:35AM ?? 0:13.86 /Library/Application Support/com.canonical.multipass/bin/qemu-system-x86_64 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-x86_64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:1a:0c:d1 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso -loadvm suspend -machine pc-i440fx-8.0
0 3264 2421 0 7:16AM ttys000 0:00.00 grep -i qemu
# kill -9 478
$ multipass list
Name State IPv4 Image
primary Stopped -- Ubuntu 20.04 LTS
ubuntu-2004 Stopped -- Ubuntu 20.04 LTS
ubuntu-lts Stopped -- Ubuntu 20.04 LTS
Ok, killing the process stopped the VM. However, upon attempting to restart the VM, it failed to restart, but restarted on the 2nd attempt:
$ multipass shell primary
shell failed: [ssh client] channel creation failed: ''
$ multipass shell primary
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-121-generic x86_64)
[...]
Last login: Fri Jan 12 06:27:00 2024 from 192.168.64.1
/home/ubuntu/.pyenv/libexec/pyenv-init: line 165: cannot create temp file for here-document: No space left on device
I think the out-of-space issue may have contributed to the hanging in the first place.
And it hung again when running multipass stop primary
, so I had to kill the process again. Next, I attempted to resize the virtual disk, as suggested in #62 :
$ multipass set local.primary.disk=20G
/Library/Logs/Multipass/multipassd.log
:
[...]
Rebuilding refcount structure
Repairing cluster 2916 refcount=1 reference=0
Repairing cluster 2917 refcount=1 reference=0
Repairing cluster 32904 refcount=1 reference=0
Repairing cluster 66009 refcount=1 reference=0
[2024-01-12T07:33:36.940] [debug] [daemon] Succeeded setting local.primary.disk=20G
Then I had to go back in and grow the partition to the new size, per: https://manpages.ubuntu.com/manpages/xenial/en/man1/growpart.1.html
Now when I do multipass info primary
I see:
Disk usage: 4.6GiB out of 19.3GiB
...and inside the VM:
/dev/sda1 20G 4.6G 15G 24% /
Now I'm able to start/stop the VM normally.
I was facing the same issue with one instance only but on Ubuntu:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.10
Release: 23.10
Codename: mantic
multipass version
multipass 1.13.0
multipassd 1.13.0
Error from the logs:
journalctl --unit 'snap.multipass*'
nothing unusual except: Error getting extra IP addresses: ssh connection failed: 'No route to host'Attempts:
Solution:
I went to this file /var/snap/multipass/common/data/multipassd/multipassd-vm-instances.json
(hint was from this comment) and noticed that state
is set to 4. I just edited it to be 0, restart multipass snap, and the vm can now boot without issues.
I'm not into details of multipass but it seems it acts based on the content of that file and for some reason when I last stopped multipass it wasn't able to update state.
I experience something similar on jammy/amd64:
When one instance is non responsive (high load for a long time), multipass list
hangs, and multipass stop
doesn't stop the instance.
Describe the bug The stop command appears to hang when trying to stop a QEMU instance. After it returns, the instance is still seen as running by Multipass. The instance goes through shutdown, but the process does not quit. In addition,
multipass list
hangs for a little, while multipass tries to get extra IPs from the instance via SSH.I can reproduce this reliably in this setting. It is unclear whether it is specific to 10.15, nesting, or something else.
To Reproduce How, and what happened?
multipass start
multipass stop
Expected behavior The instance would stop and the process would quit. Failing that, the multipass daemon would log an error or warning message saying that the process failed to quit.
Example session
Logs
From the end of the launch:
Additional info
multipass version
: A recent development version in my case: 1.11.0-dev.340.pr2766+g18e1a33e.macmultipass get local.driver
:qemu