canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.72k stars 641 forks source link

VM created for `snapcraft` won't start (timeout) using multipass #3159

Closed FirelightFlagboy closed 1 year ago

FirelightFlagboy commented 1 year ago

Describe the bug

When I try to build a snap package using snapcraft that use multipass to start a VM.

To Reproduce

  1. snapcraft snap

Expected behavior

That multipass didn't timeout when start / stopping VM used for snapcraft.

If that could help: I didn't have that issue on the date of 2023-06-05

Logs

$ snapcraft snap
Launching a VM.
Build environment is in unknown state, cleaning first.
info failed: The following errors occurred:
instance "snapcraft-[REDACTED]" does not exist
launch failed: The following errors occurred:
snapcraft-[REDACTED]: timed out waiting for response
An error occurred with the instance when trying to launch with 'multipass': returned exit code 2.
Ensure that 'multipass' is setup correctly and try again.
$ sudo journalctl --unit 'snap.multipass*' 
Failed to add filter for units: No data available

I've provided the log below without filtering by unit

Additional info

Additional context

Both start & stopping the VM created by snapcraft timeout. But VMs created by multipass do not :thinking:

After the timeout start, qemu is still running and use 100% of a CPU core ad vitam aeternam

ricab commented 1 year ago

Hi @FirelightFlagboy, sorry for your troubles. I had a look at your logs and did not find any obvious reason for the time out. The only thing I saw out of the ordinary is that AppArmor is disabled, but that shouldn't be a problem.

Both start & stopping the VM created by snapcraft timeout. But VMs created by multipass do not :thinking:

Is that something you observed just once or twice, or consistently? Even with different VMs created via snapcraft? I wonder if there is something specific to snapcraft that provokes this behavior or if it is just a coincidence.

Unfortunately we've been observing these timeouts from time to time. It can be caused by image corruption, often following a bad shutdown. In any case, VMs that time out once often keep timing out. Have you tried purging the VM and starting from scratch? I know that's not ideal, but it might unblock you. A snap restart multipass may also help.

FWIW, we've recently added code (#3114) to improve this a little and give us a better view into the causes. If you're willing, you could snap refresh --edge multipass and let us know how it goes for you.

FirelightFlagboy commented 1 year ago

Is that something you observed just once or twice, or consistently?

I've tried yesterday and today, after restart also on another project

snapcraft vm timeout but vm created by multipass (via multipass launch) do not

It can be caused by image corruption

Is it possible to verify them ? or the remove them to re-download them ?

Have you tried purging the VM and starting from scratch

Yes, I do multipass delete --all --purge

we've recently added code (https://github.com/canonical/multipass/pull/3114) to improve this a little and give us a better view into the causes

I will try that

FirelightFlagboy commented 1 year ago

A snap restart multipass

I've tried that previously but is will hang-out because it is waiting for the VMs to shutdown (which timeout :sweat:)

ricab commented 1 year ago

OK, what you can do to get out fo that situation if to kill -SIGKILL the qemu-system... process(es) that multipass is waiting on, as well as the dnsmasq process if it gets left behind.

ricab commented 1 year ago

Is it possible to verify them ? or the remove them to re-download them ?

The original downloaded image remains untouched by VMs, so it is very unlikely that one gets corrupted and you don't need to re-download that. The images that can get corrupted by VMs are particular to them. So you can just launch a new VM and it will have a fresh image.

If you really want to try a completely fresh setup (including fresh images) you can snap remove --purge multipass and then install again.

FirelightFlagboy commented 1 year ago

I've install multipass using the channel edge using snap refresh --channel=edge multipass

I've also restart multipass using snap restart multipass.

Start snapd & multipass service ```shell $ sudo systemctl start snapd.apparmor.service snapd.service snap.multipass.multipassd.service snapd.socket $ sudo systemctl status snapd.apparmor.service snapd.service snap.multipass.multipassd.service snapd.socket ○ snapd.apparmor.service - Load AppArmor profiles managed internally by snapd Loaded: loaded (/usr/lib/systemd/system/snapd.apparmor.service; disabled; preset: disabled) Active: inactive (dead) Condition: start condition failed at Thu 2023-07-13 13:58:55 CEST; 8s ago └─ ConditionSecurity=apparmor was not met Jul 13 08:49:17 geode systemd[1]: Load AppArmor profiles managed internally by snapd was skipped becaus> Jul 13 13:58:55 geode systemd[1]: Load AppArmor profiles managed internally by snapd was skipped becaus> ● snapd.service - Snap Daemon Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; preset: disabled) Active: active (running) since Thu 2023-07-13 08:49:17 CEST; 5h 9min ago TriggeredBy: ● snapd.socket Main PID: 2355 (snapd) Tasks: 25 (limit: 38349) Memory: 239.9M CPU: 3.938s CGroup: /system.slice/snapd.service └─2355 /usr/lib/snapd/snapd Jul 13 08:49:17 geode snapd[2355]: overlord.go:277: Acquired state lock file Jul 13 08:49:17 geode snapd[2355]: daemon.go:247: started snapd/2.60-1 (series 16; classic; devmode) ar> Jul 13 08:49:17 geode snapd[2355]: daemon.go:340: adjusting startup timeout by 55s (pessimistic estimat> Jul 13 08:49:17 geode snapd[2355]: backends.go:58: AppArmor status: apparmor not enabled Jul 13 08:49:17 geode systemd[1]: Started Snap Daemon. Jul 13 08:49:18 geode snapd[2355]: hotplug.go:200: hotplug device add event ignored, enable experimenta> Jul 13 08:59:17 geode snapd[2355]: devicemgr.go:2361: no NTP sync after 10m0s, trying auto-refresh anyw> Jul 13 08:59:18 geode snapd[2355]: storehelpers.go:773: cannot refresh: snap has no updates available: > Jul 13 13:56:55 geode snapd[2355]: services.go:1103: RemoveSnapServices - disabling snap.multipass.mult> Jul 13 13:56:57 geode snapd[2355]: storehelpers.go:773: cannot refresh snap "multipass": snap has no up> ● snap.multipass.multipassd.service - Service for snap application multipass.multipassd Loaded: loaded (/etc/systemd/system/snap.multipass.multipassd.service; disabled; preset: disabled) Active: active (running) since Thu 2023-07-13 13:58:55 CEST; 8s ago Main PID: 76339 (multipassd) Tasks: 29 (limit: 38349) Memory: 63.0M CPU: 472ms CGroup: /system.slice/snap.multipass.multipassd.service ├─76339 /snap/multipass/10159/bin/multipassd --verbosity debug --logger platform └─76436 /snap/multipass/10159/usr/sbin/dnsmasq --keep-in-foreground --strict-order --bind-> Jul 13 13:58:56 geode multipassd[76339]: Loading "ros2-humble" v1 Jul 13 13:58:56 geode multipassd[76339]: gRPC listening on unix:/var/snap/multipass/common/multipass_so> Jul 13 13:58:56 geode multipassd[76339]: [76546] started: qemu-img snapshot -l /var/snap/multipass/comm> Jul 13 13:58:57 geode multipassd[76339]: Starting Multipass 1.13.0-dev.233+g82495d00 Jul 13 13:58:57 geode multipassd[76339]: Daemon arguments: /snap/multipass/10159/bin/multipassd --verbo> Jul 13 13:59:01 geode dnsmasq[76436]: reading /etc/resolv.conf Jul 13 13:59:01 geode dnsmasq[76436]: using nameserver 1.1.1.1#53 Jul 13 13:59:01 geode dnsmasq[76436]: using nameserver 192.168.1.254#53 Jul 13 13:59:01 geode dnsmasq[76436]: using nameserver fd0f:ee:b0::1#53 Jul 13 13:59:01 geode dnsmasq[76436]: using only locally-known addresses for multipass ● snapd.socket - Socket activation for snappy daemon Loaded: loaded (/usr/lib/systemd/system/snapd.socket; disabled; preset: disabled) Active: active (running) since Thu 2023-07-13 08:49:17 CEST; 5h 9min ago Triggers: ● snapd.service Listen: /run/snapd.socket (Stream) /run/snapd-snap.socket (Stream) Tasks: 0 (limit: 38349) Memory: 0B CPU: 207us CGroup: /system.slice/snapd.socket Jul 13 08:49:17 geode systemd[1]: Starting Socket activation for snappy daemon... Jul 13 08:49:17 geode systemd[1]: Listening on Socket activation for snappy daemon. ```
Multipass cleanup ```shell $ multipass delete --all --purge $ multipass info --all ```
$ snapcraft build
Launching a VM.
Build environment is in unknown state, cleaning first.
info failed: The following errors occurred:
instance "snapcraft-[REDACTED]" does not exist
launch failed: The following errors occurred:
snapcraft-[REDACTED]: timed out waiting for response
An error occurred with the instance when trying to launch with 'multipass': returned exit code 2.
Ensure that 'multipass' is setup correctly and try again.

I can do multipass list

$ multipass list
Name                     State             IPv4             Image
snapcraft-[REDACTED]     Delayed Shutdown  N/A              Ubuntu 20.04 LTS

multipass info --all timeout after 2min:

$ multipass info --all
info failed: Could not determine IP address within 120000ms

Using htop, I see 2 qemu-system process using 100% cpu and not stopping until killed.

journalctl log

ricab commented 1 year ago

Hi @FirelightFlagboy, it looks like there is a problem with the latest Focal (20.04) image for snapcraft. There's already a bug report for that and hopefully there will be a fix before long.

In the meantime, if you can by any chance move to 22.04, there shouldn't be any problems with that. If you can't change the base in the snapcraft.yaml, your only alternative would be to build with LXD.

ricab commented 1 year ago

@FirelightFlagboy, actually there is one other way you can work around this. After having encontered this problem, killed and deleted the VM, do:

  1. snap disable multipass
  2. download an older image from https://cloud-images.ubuntu.com/buildd/daily/focal/20230706/focal-server-cloudimg-amd64-disk1.img (note that this won't be available for long and I believe the problem starts on the following day)
  3. sudo mv focal-server-cloudimg-amd64-disk1.img /var/snap/multipass/common/cache/multipassd/vault/images/focal-20230711 (the goal is to overwrite the faulty image, so that should be the existing directory for that image. You may need to replace the date in the directory name as appropriate)
  4. snap enable multipass
  5. try snapcraft again.
FirelightFlagboy commented 1 year ago

Thanks you for your precious help

ricab commented 1 year ago

Hi @FirelightFlagboy, that focal image should be fixed now and working again normally. I will close this now, but let us know how it goes for you and feel free to reopen.

ecruzolivera commented 2 months ago

Hello, im having this same issue:

snap info multipass
name:      multipass
summary:   Instant Ubuntu VMs
publisher: Canonical✓
store-url: https://snapcraft.io/multipass
contact:   https://github.com/canonical/multipass/issues/new
license:   GPL-3.0
description: |
  Multipass is a tool to launch and manage VMs on Windows, Mac and Linux that simulates a cloud
  environment with support for cloud-init. Get Ubuntu on-demand with clean integration to your IDE
  and version control on your native platform.
................
commands:
  - multipass.gui
  - multipass
services:
  multipass.multipassd: simple, enabled, active
snap-id:      mA11087v6dR3IEcQLgICQVjuvhUUBUKM
tracking:     latest/stable
refresh-date: 12 days ago, at 07:57 CST
channels:
  latest/stable:    1.13.1                    2024-06-28 (12829) 122MB -
  latest/candidate: 1.13.1                    2024-07-10 (12924) 122MB -
  latest/beta:      1.14.0                    2024-07-11 (12940)  64MB -
  latest/edge:      1.15.0-dev.2328+g6ee0b4ed 2024-07-10 (12931)  64MB -
installed:          1.13.1                               (12829) 122MB -

when running snapcraft i get:

 snapcraft
Launching a VM.
launch failed: failed to download from 'https://cloud-images.ubuntu.com/buildd/daily/focal/20240624/focal-server-cloudimg-amd64-disk1.img': Error opening https://cloud-images.ubuntu.com/buildd/daily/focal/20240624/focal-server-cloudimg-amd64-disk1.img
An error occurred with the instance when trying to launch with 'multipass': returned exit code 2.
Ensure that 'multipass' is setup correctly and try again.

multipass is searching for the ubuntu image 2024/06/24 but it seems that in the mirror all images are from july.

https://cloud-images.ubuntu.com/buildd/daily/focal/

image

ecruzolivera commented 2 months ago

the

@FirelightFlagboy, actually there is one other way you can work around this. After having encontered this problem, killed and deleted the VM, do:

  1. snap disable multipass
  2. download an older image from https://cloud-images.ubuntu.com/buildd/daily/focal/20230706/focal-server-cloudimg-amd64-disk1.img (note that this won't be available for long and I believe the problem starts on the following day)
  3. sudo mv focal-server-cloudimg-amd64-disk1.img /var/snap/multipass/common/cache/multipassd/vault/images/focal-20230711 (the goal is to overwrite the faulty image, so that should be the existing directory for that image. You may need to replace the date in the directory name as appropriate)
  4. snap enable multipass
  5. try snapcraft again.

this solution doesnt work for me, after I enable multipass it seems that it deletes the /var/snap/multipass/common/cache/multipassd/vault/images/focal-20240624 directory

ecruzolivera commented 2 months ago

multipass fails for every ubuntu image

 multipass launch 20.04
launch failed: failed to download from 'https://cloud-images.ubuntu.com/releases/server/releases/focal/release-20240612/ubuntu-20.04-server-cloudimg-amd64.img': Error opening https://cloud-images.ubuntu.com/releases/server/releases/focal/release-20240612/ubuntu-20.04-server-cloudimg-amd64.img
multipass launch 22.04
launch failed: failed to download from 'https://cloud-images.ubuntu.com/releases/server/releases/jammy/release-20240614/ubuntu-22.04-server-cloudimg-amd64.img': Error opening https://cloud-images.ubuntu.com/releases/server/releases/jammy/release-20240614/ubuntu-22.04-server-cloudimg-amd64.img
 multipass launch lts
launch failed: failed to download from 'https://cloud-images.ubuntu.com/releases/server/releases/noble/release-20240622/ubuntu-24.04-server-cloudimg-amd64.img': Error opening https://cloud-images.ubuntu.com/releases/server/releases/noble/release-20240622/ubuntu-24.04-server-cloudimg-amd64.img
multipass launch devel
launch failed: failed to download from 'https://cloud-images.ubuntu.com/daily/server/oracular/20240622/oracular-server-cloudimg-amd64.img': Error opening https://cloud-images.ubuntu.com/daily/server/oracular/20240622/oracular-server-cloudimg-amd64.img
ricab commented 2 months ago

Hi @ecruzolivera, let's follow up in #3577