canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.83k stars 650 forks source link

Host suspending causing an array of problems #1365

Open relaxdiego opened 4 years ago

relaxdiego commented 4 years ago

This happens intermittently for me so I'm filing this issue in the hopes that somebody can suggest some steps I can take to share any diagnostics or logs that can help the team debug the problem when it recurs since I have no idea where to look.

In the meantime, here are the symptoms:

$ multipass restart dev
restart failed: ssh connection failed: 'Timeout connecting to 192.168.64.2'
$ multipass stop dev
Stopping dev |[2020-02-17T10:11:32.643] [error] [dev] process error occurred Crashed
$ multipass start dev
$ multipass shell dev
Initialising new SSH agent...
succeeded

Let me know how I can help debug this issue. My MacBook specs:

Model Name: MacBook Pro
  Model Identifier: MacBookPro16,1
  Processor Name:   6-Core Intel Core i7
  Processor Speed:  2.6 GHz
  Number of Processors: 1
  Total Number of Cores:    6
  L2 Cache (per Core):  256 KB
  L3 Cache: 12 MB
  Hyper-Threading Technology:   Enabled
  Memory:   16 GB
  Boot ROM Version: 1037.80.53.0.0 (iBridge: 17.16.13050.0.0,0)
  Activation Lock Status:   Enabled

System Software Overview:
  System Version:   macOS 10.15.3 (19D76)
  Kernel Version:   Darwin 19.3.0
  Secure Virtual Memory:    Enabled
  System Integrity Protection:  Enabled
  Time since boot:  6 days 4:43
Saviq commented 4 years ago

Hi @relaxdiego can you please attach your Multipass log from /Library/Logs/Multipass/multipassd.log?

relaxdiego commented 4 years ago

Hi @relaxdiego can you please attach your Multipass log from /Library/Logs/Multipass/multipassd.log?

Hi @Saviq. Here you go: https://gist.github.com/relaxdiego/e785351603c39a62d644f1e930185cad

Saviq commented 4 years ago

Thanks @relaxdiego there's one worrying bit there:

hyperkit: [ERROR] connect /var/root/Library/Application Support/multipassd/vault/instances/dev/ubuntu-18.04-server-cloudimg-amd64.img: failed to open file

Any idea what could be wrong with that file? I can see that it booted fine the next day, but it's unclear whether it's the same instance or another one named the same?

relaxdiego commented 4 years ago

Hi @Saviq. Unfortunately, I can no longer recall what exactly happened in 2020-02-10 when that log was written. :-( I can only recall events from 2020-02-17 at this point.

relaxdiego commented 4 years ago

@Saviq happened again today. Latest logs: https://gist.github.com/relaxdiego/7ea9d6441422b4cf3b8e3b4ac9b0ebcd

Note that at 2020-02-19T03:52:06.205 I was still asleep and my MBP was supposed to be too (lid was closed and tucked comfortably in its bag)

relaxdiego commented 4 years ago

In addition:

sudo ls -l /var/root/Library/Application\ Support/multipassd/vault/instances/dev/ubuntu-18.04-server-cloudimg-amd64.img
-rw-r--r--  1 root  wheel  10011410432 Feb 19 04:03 /var/root/Library/Application Support/multipassd/vault/instances/dev/ubuntu-18.04-server-cloudimg-amd64.img
relaxdiego commented 4 years ago

Sorry for the piecemeal updates. I tried stopping the VM and even that wouldn't work anymore:

$ multipass stop -v --all
$ multipass list
Name                    State             IPv4             Image
dev                     Running           192.168.64.2     Ubuntu 18.04 LTS

Let me know what else I can do to help debug. I'll leave this VM running in the meantime.

Saviq commented 4 years ago

Hey @relaxdiego sleeping may be an important point here.

This:

[2020-02-19T00:37:58.295] [info] [dev] hyperkit: [INFO] resize: adding available clusters (Node ((x 152250) (y 152761) (l Empty) (r Empty) (h 1) (cardinal 512)))
[2020-02-19T03:52:05.598] [error] [dev] hyperkit: [ERROR] write sector = 6384736 length = 4096: I/O deadline exceeded

It looks like it went to sleep at 00:37 and then woke up at 03:52 (some macOS maintenance maybe?) and multipassd got started, remembering that dev was running and tried to start it again. But that failed due to the file error for some reason.

Is the instance actually running still? Can you multipass sh dev successfully? And if you multipass -vvv stop dev, does it stop, or print anything useful?

relaxdiego commented 4 years ago

Hi @Saviq I tried multiple times earlier today to multipass stop dev but it didn't output anything and the machine stayed running. The logs showed these for every time I attempted stop:

[2020-02-19T07:12:47.050] [info] [daemon] Cannot open ssh session on "dev" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.2'
[2020-02-19T07:16:37.178] [info] [daemon] Cannot open ssh session on "dev" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.2'
[2020-02-19T07:18:13.264] [info] [daemon] Cannot open ssh session on "dev" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.2'
[2020-02-19T07:20:14.442] [info] [daemon] Cannot open ssh session on "dev" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.2'
[2020-02-19T07:22:20.506] [info] [daemon] Cannot open ssh session on "dev" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.2'

So I could neither stop, restart, nor shell to it at all. I resorted to restarting my MBP instead which finally did it (thank god!). Here's the log from the point I restarted it up to now: https://gist.github.com/relaxdiego/0c16ec9630a5f9cbcd4db7264d63180c

Saviq commented 4 years ago

Right, so Multipass thought the instance was accessible, but it wasn't, in fact, running.

Thanks @relaxdiego I think I now understand at least that part of the problem. Why it failed to start overnight (and why did it even try?) is still to be understood.

relaxdiego commented 4 years ago

Good luck @Saviq and let me know how else I can help debug it. I’m not at all blocked by this as I have other means do my work. So take as much time as you need and don’t hesitate to reach out for more info since I still have the VM here.

eegrok commented 2 years ago

I have basically the same problem after running: sudo shutdown now from within my instance -- multipass thinks it's still running, but appears to try to manage it through ssh commands. I can't see any way to force multipass to recognize that it is turned off. I'm guessing a reboot of my computer will fix the issue, as it did for @relaxdiego