canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.76k stars 642 forks source link

launch failed: Remote "" is unknown or unreachable ā€” then `multipass launch` magically starts working 15min after boot! #3464

Closed holta closed 4 months ago

holta commented 6 months ago

Describe the bug

~Since December/January, several people have confirmed that the Edge channel version of multipass can no longer run multipass launch after every boot.~ multipass launch fails when run within 15min of boot. Several have confirmed this, and we each assumed the bug would go away. But 4+ months later it has not (hence this bug report!)

The error message is: launch failed: Remote "" is unknown or unreachable.

One workaround (annoying, but it works) is to run: snap restart multipass

Additional info

Host PC details:

# cat /etc/issue
Ubuntu 22.04.4 LTS \n \l

# multipass version
multipass   1.14.0-dev.1600+g2701b66b
multipassd  1.14.0-dev.1600+g2701b66b

# multipass get local.driver
lxd

# multipass get local.bridged-network
enp1s0

# lxd --version
5.20

# snap info lxd
...
installed:          5.20-f3dd836              (27049) 155MB -

# uname -a
Linux 126-u2204-desk 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Additional context

ASIDE: Will we be able test this on pre-release builds of Multipass on Windows in future, as hinted at... ?

Likely Related:

townsend2010 commented 6 months ago

Hi @holta!

We have not seen this issue. We occasionally get reports of this same symptom on releases versions of Multipass, but that is usually specific to the user's host system and/or network.

To help us better diagnose the issue, could you please provide a full journal from boot time to when you see this symptom? Thanks!

holta commented 6 months ago

full journal from boot time

multipass launch fails every time after booting:

# multipass launch
launch failed: Remote "" is unknown or unreachable.

# journalctl -b -u 'snap.multipass*'
Apr 05 16:16:37 126-u2204-desk systemd[1]: Started Service for snap application multipass.multipassd.
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Error getting https://codeload.github.com/canonical/multipass-blueprints/zip/refs/heads/main: Host codeload.github.com not found - trying cache.
Apr 05 16:16:39 126-u2204-desk multipassd[714]: gRPC listening on unix:/var/snap/multipass/common/multipass_socket
Apr 05 16:16:39 126-u2204-desk multipassd[714]: fetch manifest periodically
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Cannot retrieve headers for https://cdimage.ubuntu.com/ubuntu-core/16/stable/current/ubuntu-core-16-amd64.img.xz: Host cdimage.ubuntu.com not found
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Cannot retrieve headers for https://cdimage.ubuntu.com/ubuntu-core/20/stable/current/ubuntu-core-20-amd64.img.xz: Host cdimage.ubuntu.com not found
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Cannot retrieve headers for https://cdimage.ubuntu.com/ubuntu-core/22/stable/current/ubuntu-core-22-amd64.img.xz: Host cdimage.ubuntu.com not found
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Cannot retrieve headers for https://cdimage.ubuntu.com/ubuntu-core/18/stable/current/ubuntu-core-18-amd64.img.xz: Host cdimage.ubuntu.com not found
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Error getting https://cloud-images.ubuntu.com/buildd/daily/streams/v1/index.json: Host cloud-images.ubuntu.com not found - trying cache.
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Error getting https://cloud-images.ubuntu.com/releases/streams/v1/index.json: Host cloud-images.ubuntu.com not found - trying cache.
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Error getting https://cloud-images.ubuntu.com/daily/streams/v1/index.json: Host cloud-images.ubuntu.com not found - trying cache.
Apr 05 16:16:39 126-u2204-desk multipassd[714]: Could not update manifest: failed to download from 'https://cdimage.ubuntu.com/ubuntu-core/16/stable/current/ubuntu-core-16-amd64.img.xz': Host cdimage.ubuntu.com not found
Apr 05 16:16:46 126-u2204-desk multipassd[714]: Using the 'multipass' storage pool.
Apr 05 16:16:46 126-u2204-desk multipassd[714]: Starting Multipass 1.14.0-dev.1600+g2701b66b
Apr 05 16:16:46 126-u2204-desk multipassd[714]: Daemon arguments: /snap/multipass/12063/bin/multipassd --verbosity debug --logger platform
Apr 05 16:20:50 126-u2204-desk systemd[1]: Started snap.multipass.multipass-13d6fc67-581c-4732-8249-0189fd0a91b2.scope.
Apr 05 16:20:51 126-u2204-desk systemd[1]: snap.multipass.multipass-13d6fc67-581c-4732-8249-0189fd0a91b2.scope: Deactivated successfully.

Further logs show applying of the workaround:

# snap restart multipass
Restarted.

# multipass launch
Launched: humorous-poodle

# journalctl -b -u 'snap.multipass*'
...
Apr 05 16:31:40 126-u2204-desk multipassd[714]: fetch manifest periodically
Apr 05 16:38:05 126-u2204-desk multipassd[714]: Received signal 15 (Terminated)
Apr 05 16:38:05 126-u2204-desk multipassd[714]: Goodbye!
Apr 05 16:38:05 126-u2204-desk systemd[1]: Stopping Service for snap application multipass.multipassd...
Apr 05 16:38:06 126-u2204-desk systemd[1]: snap.multipass.multipassd.service: Deactivated successfully.
Apr 05 16:38:06 126-u2204-desk systemd[1]: Stopped Service for snap application multipass.multipassd.
Apr 05 16:38:06 126-u2204-desk systemd[1]: snap.multipass.multipassd.service: Consumed 2.567s CPU time.
Apr 05 16:38:06 126-u2204-desk systemd[1]: Started Service for snap application multipass.multipassd.
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "anbox-cloud-appliance" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "charm-dev" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "docker" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "jellyfin" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "minikube" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "ros-noetic" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Loading "ros2-humble" v1
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: gRPC listening on unix:/var/snap/multipass/common/multipass_socket
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: fetch manifest periodically
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Using the 'multipass' storage pool.
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Starting Multipass 1.14.0-dev.1600+g2701b66b
Apr 05 16:38:06 126-u2204-desk multipassd[1928]: Daemon arguments: /snap/multipass/12063/bin/multipassd --verbosity debug --logger platform
Apr 05 16:39:01 126-u2204-desk systemd[1]: Started snap.multipass.multipass-8d10f7fb-0ee1-4262-ba05-d262e920627a.scope.
Apr 05 16:39:01 126-u2204-desk multipassd[1928]: Using the 'multipass' storage pool.
Apr 05 16:39:01 126-u2204-desk multipassd[1928]: Creating instance with image id: da10d667adf5588f2540afcbbaf0bb91c529719003f53fdc1f4d589fb407d0aa
Apr 05 16:39:07 126-u2204-desk multipassd[1928]: Waiting for SSH to be up
Apr 05 16:39:29 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:32 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:33 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:34 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:36 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:37 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:38 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:39 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:40 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:42 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:43 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:44 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:45 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:46 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:47 126-u2204-desk multipassd[1928]: Executing '[ -e /var/lib/cloud/instance/boot-finished ]'
Apr 05 16:39:47 126-u2204-desk systemd[1]: snap.multipass.multipass-8d10f7fb-0ee1-4262-ba05-d262e920627a.scope: Deactivated successfully.

Just FYI: Multipass's "How to access logs" instructions at https://multipass.run/docs/accessing-logs say to run journalctl --unit 'snap.multipass*' but this turns out to be problematic and unhelpful in cases like this. That command is projected to take an hour-or-more to complete (any idea why it seems to be unpacking / unzipping old Multipass log files from the past ~12 months painfully slowly, in what appears to be a very inefficient manner?) On the bright side, people who happen to know about journalctl -b (i.e. journal from boot time) can avoid most of this nightmare. šŸ™

holta commented 5 months ago

@townsend2010 are the above logs sufficient, to help you diagnose?

townsend2010 commented 5 months ago

Hi @holta!

I have a hunch of what the problem is, but I really need to examine full logs of the boot including the multipass log entries. By full log, I mean everything that happens during the boot.

My suspicion is that there is a race between all of the host networking coming up and Multipass starting. It looks to me that at the very least, the host does not have DNS running yet when Multipass starts and the way Qt networking works is that it caches the networking info (such as DNS), so if DNS is not configured yet, then Qt won't be either.

holta commented 5 months ago

full logs of the boot including the multipass log entries. By full log, I mean everything that happens during the boot.

Got it, and thanks for explaining what may be happening. šŸ’Æ

Here are 2344 lines from boot (journalctl -b) posted to a pastebin:

Lines 2343 and 2344 are a result of multipass launch failing to run. (Just LMK if you need anything else!)

townsend2010 commented 5 months ago

Hi @holta!

Thank you for the detailed logs. This is helpful. It does look like there is a potential race between the networking being fully up (such as DNS not configured yet) and Multipass starting.

Upon some initial investigation, it looks like Qt caches the current network settings and will use that for the duration of the QNetworkAccessManager's lifetime. We instantiate this when the daemon starts. We have not made any changes in this area in the time frame you began to see this problem, so it's possible something is causing your systems to take a little longer for networking to be completely up and fully configured.

Qt networking is not very robust when it comes to changes in networking after it has been instantiated. We will have to think about if there is anything we can do to help mitigate this problem, but I don't have an answer at this time.

holta commented 5 months ago

Qt networking is not very robust when it comes to changes in networking after it has been instantiated.

@townsend2010 any chance upgrading to the imminent Ubuntu 24.04 might help? Or reveal clues?

(Just FYI this not only occurs on this generic/vanilla Ubuntu 22.04 PC ā€” but the same exact problem also began in December/January on a very different brand / different manufacturer PC.)

townsend2010 commented 5 months ago

Hi @holta,

I'm not sure if upgrading to 22.04 will help or not. Regarding a very different brand PC, is it running the same version of Ubuntu or something else?

In the meantime, as an experiment that hopefully will prove my theory, please try the following: $ sudo systemctl edit snap.multipass.multipassd.service Add the following to the override file:

[Unit]
After=
After=snap-multipass-12153.mount network.target snapd.apparmor.service systemd-resolved.service

Ensure the number contained in snap-multipass-xxxxx.mount matches what's list below in the file for the After= line.

Reboot the machine and see if this helps. This is supposed to ensure multipass starts after systemd-resolved is started which should help with making sure DNS is set up on the host first.

zchem commented 5 months ago

Doesn't work for me.

$ sudo systemctl edit snap.multipass.multipassd.service

### Editing /etc/systemd/system/snap.multipass.multipassd.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Unit]
After=
After=snap-multipass-12146.mount network.target snapd.apparmor.service systemd-resolved.service

### Edits below this comment will be discarded

### /etc/systemd/system/snap.multipass.multipassd.service
# [Unit]
# # Auto-generated, DO NOT EDIT
# Description=Service for snap application multipass.multipassd
# Requires=snap-multipass-12146.mount
# Wants=network.target
# After=snap-multipass-12146.mount network.target snapd.apparmor.service
# X-Snappy=yes
# 
# [Service]
# EnvironmentFile=-/etc/environment
# ExecStart=/usr/bin/snap run multipass.multipassd
# SyslogIdentifier=multipass.multipassd
# Restart=on-failure
# WorkingDirectory=/var/snap/multipass/12146
# TimeoutStopSec=300
# Type=simple
# 
# [Install]
# WantedBy=multi-user.target

$ multipass launch -n vm1
launch failed: Remote "" is unknown or unreachable.

$ ping -c 1 cloud-images.ubuntu.com
PING cloud-images.ubuntu.com (185.125.190.40) 56(84) bytes of data.
64 bytes from https-services.aerodent.canonical.com (185.125.190.40): icmp_seq=1 ttl=53 time=240 ms

--- cloud-images.ubuntu.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 239.776/239.776/239.776/0.000 ms

$ ping -c 1 cdimage.ubuntu.com
PING cdimage.ubuntu.com (185.125.190.37) 56(84) bytes of data.
64 bytes from https-services.actiontoad.canonical.com (185.125.190.37): icmp_seq=1 ttl=49 time=205 ms

--- cdimage.ubuntu.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 205.023/205.023/205.023/0.000 ms

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 23.10"
NAME="Ubuntu"
VERSION_ID="23.10"
VERSION="23.10 (Mantic Minotaur)"
VERSION_CODENAME=mantic
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=mantic
LOGO=ubuntu-logo

Attaching multipass logs. multipass.txt

zchem commented 5 months ago

I'm trying to compare the strace output of when it's not working vs when it's working. Seems like multipass is writing some info to /run/user/1000/bus and trying it fails when it try to receive info from it later? strace-diff

holta commented 5 months ago

Doesn't work for me.

  1. Likewise šŸ˜æ. Thanks @zchem for confirming!

  2. Logs:

  3. Curiously multipass launch suddenly starts working "exactly ~15 min" after boot.

    It begins to work immediately after fetch manifest periodically appears in logs like journalctl -b -u snap.multipass.multipassd.service

    All of which reminds me I had the same problem prior to Dec 2023, and thought the problem was almost randomly intermittent, when I likely didn't realize that multipass launch magically comes to life 15min after boot (-:

    @townsend2010 probably already knew this (still, does confirmation provide any new hints?!)

  4. Just FYI for @townsend2010, systemd's override.conf (not that it solved the problem ;) can in fact be made shorter:

    # cat /etc/systemd/system/snap.multipass.multipassd.service.d/override.conf
    [Unit]
    After=systemd-resolved.service
    
    # systemctl daemon-reload
    
    # systemctl show snap.multipass.multipassd.service | grep After=
    After=network.target systemd-journald.socket system.slice -.mount snap-multipass-12063.mount sysinit.target 
    snapd.apparmor.service systemd-resolved.service basic.target

    The reason is that After= is a systemd "dependency" ā€” meaning that it cannot be reset to an empty list ā€” only additional values can be added: https://unix.stackexchange.com/questions/398540/how-to-override-systemd-unit-file-settings/468067#468067

holta commented 5 months ago

We now know this problem occurs when the Host PC is Ubuntu Desktop 22.04, 23.10 and presumably also 24.04

Clarifying Question: Are we sure this problem does NOT also occur with Ubuntu Server?

townsend2010 commented 5 months ago

I think the 15 minute data point is important and not one that I was aware of. It does seem to be a race between Multipass trying to retrieve image info the first time and networking being fully configured. Why this race suddenly appeared around the December/January time frame is currently unknown to me. I think it may have to do with a change in Ubuntu around that time, but that is only speculation by me.

At any rate, I think we can be better about handling the error and try again much sooner when an error occurs. Perhaps a sliding timeout on errors, so the first error occurs, we try again in 5 seconds. If that errors, try again in 15 seconds. If that errors, try in 30 seconds. So on and so forth with a maximum timeout of the current 15 minutes.

Since we all develop Multipass on our regular Ubuntu machines, we are not really rebooting much and trying Multipass right away which may explain why the development team does not see this. We will work on a recovery solution before the upcoming 1.14 release.

Thanks for your patience and help on this!

townsend2010 commented 5 months ago

Also, not that this is a long term solution, but running: $ multipass find --force-update will force the update of the image info and if the networking is up, it should unblock launching.

holta commented 5 months ago

Why this race suddenly appeared around the December/January time frame is currently unknown to me. I think it may have to do with a change in Ubuntu around that time, but that is only speculation by me.

I apologize if it turns out I was wrong: e.g. the problem might have pre-dated Dec 2023 / Jan 2024.

(My perception was that the problem became far more severe ~4 months ago, definitely happening right after every boot. But I'm now wondering if I might have misinterpreted the problem prior to Dec 2023, if the very same thing was happening then, and I erroneously interpreted the "fixes itself 15 minute after boot" reality as "random" intermittent failures back then? :-)

townsend2010 commented 5 months ago

No worries about the timing of when this seemed to begin to occur- I don't think that part is important anymore. The good news is that we understand the problem now and will work on the proper solution. Thanks again!

holta commented 5 months ago

We now know this problem occurs when the Host PC is Ubuntu Desktop 22.04, 23.10 and presumably also 24.04

Clarifying Question: Are we sure this problem does NOT also occur with Ubuntu Server?

FYI a friend tested Multipass 1.14.x (edge channel) on Ubuntu Server 24.04 (latest pre-release, installed just today) and multipass launch does work right after boot.

Strangely the multipass set local.driver=lxd command does not work for him, even after snap install lxd and rebooting multiple times, so I'd like him to open a separate ticket to investigate? The error he gets is:

set failed: You have to `multipass set local.bridged-network=<name>` to use the bridged shortcut.

[ New ticket is: ]

holta commented 5 months ago

PS the very separate error launch failed: Downloaded image hash does not match occurs quite a lot recently, both on Windows 11 and Ubuntu Server 24.04:

(No idea why these errors are rather suddenly appearing in the last 2 weeks ā€” but since it's happening repeatedly/regularly, I thought I'd mention it!)

Possibly related: