canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.85k stars 651 forks source link

Cannot reboot instances on Windows 10 or 11 if `/etc/hostname` changes (within the instance) #3346

Closed holta closed 10 months ago

holta commented 10 months ago

Multipass instances work great initially on Windows! Until we try to reboot them!

Logs often show:

[u22] Error getting extra IP addresses: ssh connection failed: 'Failed to resolve hostname u22.mshome.net (No such host is known. )' [u22] Waiting for SSH to be up

Thank you for helping us to try to resolve this — as it's affecting everyone who's attempted to use Multipass on Windows 10 and Windows 11 so far:

Possibly related: Internet Connection Sharing file C:\WINDOWS\System32\drivers\etc\hosts.ics appears to be accidentally-but-regularly "damaged" by Multipass — which somehow inserts stray numbers as follows:

192.168.179.12 box.mshome.net # 2024 1 1 8 23 53 18 185
192.168.176.1 DESKTOP-XXXXXXX.mshome.net # 2028 12 6 30 23 53 18 185
192.168.179.226 primary.mshome.net # 2024 1 1 8 22 3 21 851
0 54 30 497

67

The https://multipass.run/docs/troubleshoot-networking Troubleshooting Doc seemed very promising. But then all 3 tips listed there (at the bottom, for Windows) were attempted, and so far do not resolve the problem.

RECAP: All such Multipass instances work great until we try to reboot them. No matter how we reboot them, they fail to reboot. Example:

C:\Users\XXXX>multipass restart u22
restart failed: ssh connection failed: 'Failed to resolve hostname u22.mshome.net (No such host is known. )'

C:\Users\XXXX>multipass shell u22
shell failed: ssh connection failed: 'Failed to resolve hostname u22.mshome.net (No such host is known. )'

C:\Users\XXXX>multipass list
Name                    State             IPv4             Image
u22                     Running           N/A              Ubuntu 22.04 LTS

C:\Users\XXXX>multipass stop u22

C:\Users\XXXX>multipass list
Name                    State             IPv4             Image
u22                     Stopped           --               Ubuntu 22.04 LTS

C:\Users\XXXX>multipass start u22
start failed: The following errors occurred:
u22: timed out waiting for response

To Reproduce

  1. Install Multipass 1.12.2 or 1.13.0 RC onto Windows 10 or 11, wiping all prior Multipass settings and instances to be extra sure.
  2. Run multipass launch 22.04 -m 2G -d 20G --cloud-init omg.yml -n u22 as specified in the Internet-in-a-Box instructions and notice that everything works great until you reboot!
  3. Try to reboot the instance 🤔

Logs

Additional info

C:\Users\XXXX>multipass version
multipass   1.13.0-rc.1308+g240e6cae1.win
multipassd  1.13.0-rc.1308+g240e6cae1.win

C:\Users\XXXX>multipass info --all
Warning: the `--all` flag for the `info` command is deprecated. Please use `info` with no positional arguments for the same effect.
info failed: ssh connection failed: 'Failed to resolve hostname u22.mshome.net (No such host is known. )'

C:\Users\XXXX>multipass info
info failed: ssh connection failed: 'Failed to resolve hostname u22.mshome.net (No such host is known. )'

C:\Users\XXXX>multipass get local.driver
hyperv
georgeliao commented 10 months ago

@holta Thanks for the detailed report and I am able to reproduce this.

To further narrow down this, I have also tried to launch a virtual machine without the clou-init configuration file multipass launch 22.04 -m 2G -d 20G -n u22. That particular instance u22 is able to restart just like normal. Besides that, I also tried this on linux with the multipass launch 22.04 -m 2G -d 20G --cloud-init omg.yml -n u22, that is fine as well. So the speculation is that it can be an issue of hyper-v launching with cloud init file.

holta commented 10 months ago

@georgeliao profound thanks:

georgeliao commented 10 months ago

@holta Well, in my case the omg.yml file line calibreweb_enabled: True is the one caused that. If that is set to false, then it can successfully restart after launch. Other cases like launching without omg.yml and restarting also work.

Furthermore, if one virtual machine successfully restarted by launching without calibreweb, Then it looks like the IP to the hostname map is somehow setup properly. After that if you delete the virtual machine (multipass delete <vm_name> --purge) and launch it with the same name with the original omg.yml then it seems to work. Not sure I fully understand how hyper-v and windows handle this.

ricab commented 10 months ago

Possibly related: Internet Connection Sharing file C:\WINDOWS\System32\drivers\etc\hosts.ics appears to be accidentally-but-regularly "damaged" by Multipass

Just want to clarify a couple of things:

That said, it would be great to get a better understanding of what exactly triggers the problem and if there is a way to work around it. That "calibreweb" find by @georgeliao is very interesting. @holta What is that line doing? Enabling an e-book server inside the instance? Any idea how that could be causing the hosts.ics to be rewritten?

holta commented 10 months ago

omg.yml file line calibreweb_enabled: True is the one caused that

Just FYI I tried many times and cannot reproduce @georgeliao's above claim:

1) Every Internet-in-a-Box instance fails to reboot (on a Windows 11 Host PC) regardless whether omg.yml variables calibreweb_install and calibreweb_enabled are set to True or False.

2) Identical Internet-in-a-Box instances succeed in rebooting when the Host PC is Linux.

3) So I'll keep bisecting to try to get closer to the root cause 🥏

holta commented 10 months ago
  1. Simple commands like multipass start and multipass shell fail every time: (on Windows 11 Host PC, even after I restart "C:\Program Files\Multipass\bin\multipassd.exe" /svc --verbosity debug using Start > Run > services.msc which can take fully ~5 minutes!)
C:\Users\XXXX>multipass start
launch failed: Failed to resize instance image - error executing powershell command. Detail: Resize-VHD : Failed to resize the virtual disk.
The system failed to resize
'C:\ProgramData\Multipass\data\vault\instances\primary\ubuntu-22.04-server-cloudimg-amd64.vhdx'.
Failed to resize the virtual disk.
The system failed to resize
'C:\ProgramData\Multipass\data\vault\instances\primary\ubuntu-22.04-server-cloudimg-amd64.vhdx': The process cannot
access the file because it is being used by another process. (0x80070020).
At line:1 char:1
+ Resize-VHD -Path C:/ProgramData/Multipass/data/vault/instances/primar ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceBusy: (:) [Resize-VHD], VirtualizationException
    + FullyQualifiedErrorId : ObjectInUse,Microsoft.Vhd.PowerShell.Cmdlets.ResizeVhd
C:\Users\XXXX>multipass shell
launch failed: Failed to resize instance image - error executing powershell command. Detail: Resize-VHD : Failed to resize the virtual disk.
The system failed to resize
'C:\ProgramData\Multipass\data\vault\instances\primary\ubuntu-22.04-server-cloudimg-amd64.vhdx'.
Failed to resize the virtual disk.
The system failed to resize
'C:\ProgramData\Multipass\data\vault\instances\primary\ubuntu-22.04-server-cloudimg-amd64.vhdx': The process cannot
access the file because it is being used by another process. (0x80070020).
At line:1 char:1
+ Resize-VHD -Path C:/ProgramData/Multipass/data/vault/instances/primar ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceBusy: (:) [Resize-VHD], VirtualizationException
    + FullyQualifiedErrorId : ObjectInUse,Microsoft.Vhd.PowerShell.Cmdlets.ResizeVhd
  1. However running multipass launch followed by multipass shell proper-crappie and multipass shell determined-weasel works every time — creating minimal instances that are indeed rebootable.

  2. Multipass has Far-Better-than-AI abilities when it poetically names instances! 😆

holta commented 10 months ago
  1. Here's the secret to making a Multipass instance self-destruct on Windows — simply change /etc/hostname — this works every time:
C:\Users\XXXX>multipass launch -n u22
Launched: u22

C:\Users\XXXX>multipass shell u22

...

root@u22:/home/ubuntu# cat /etc/hostname
u22
root@u22:/home/ubuntu# cat > /etc/hostname
caught-a-live-one
root@u22:/home/ubuntu# reboot

C:\Users\XXXX>multipass list
Name                    State             IPv4             Image
u22                     Running           N/A              Ubuntu 22.04 LTS

C:\Users\XXXX>multipass shell u22
shell failed: ssh connection failed: 'Timeout connecting to u22.mshome.net'

        WHEREAS: Changing /etc/hostname inside an instance works when the Host PC is Linux!

        SIDE QUESTION: Does changing /etc/hostname work when the Host PC is macOS ?

  1. ADVANCED QUESTION: Why are multipass launch 23.10 and multipass launch 24.04 blocked on Windows and macOS, while recent OS's can be used when the Host PC is Linux? Can Multipass please consider fixing this?
C:\Users\XXXX>multipass launch 23.10
launch failed: '23.10' is not a supported alias. Please use `multipass find` for supported image aliases.

C:\Users\XXXX>multipass launch 24.04
launch failed: '24.04' is not a supported alias. Please use `multipass find` for supported image aliases.

C:\Users\XXXX>multipass find
Image                       Aliases           Version          Description
core                        core16            20200818         Ubuntu Core 16
core18                                        20211124         Ubuntu Core 18
core20                                        20230119         Ubuntu Core 20
core22                                        20230717         Ubuntu Core 22
20.04                       focal             20231129         Ubuntu 20.04 LTS
22.04                       jammy,lts         20231211         Ubuntu 22.04 LTS
23.04                       lunar             20231205         Ubuntu 23.04
snapcraft:24.04             noble,devel,core2420240103         Ubuntu 24.04 LTS
appliance:adguard-home                        20200812         Ubuntu AdGuard Home Appliance
appliance:mosquitto                           20200812         Ubuntu Mosquitto Appliance
appliance:nextcloud                           20200812         Ubuntu Nextcloud Appliance
appliance:openhab                             20200812         Ubuntu openHAB Home Appliance
appliance:plexmediaserver                     20200812         Ubuntu Plex Media Server Appliance

Blueprint                   Aliases           Version          Description
anbox-cloud-appliance                         latest           Anbox Cloud Appliance
charm-dev                                     latest           A development and testing environment for charmers
docker                                        0.4              A Docker environment with Portainer and related tools
jellyfin                                      latest           Jellyfin is a Free Software Media System that puts you in control of managing and streaming your media.
minikube                                      latest           minikube is local Kubernetes
ros-noetic                                    0.1              A development and testing environment for ROS Noetic.
ros2-humble                                   0.1              A development and testing environment for ROS 2 Humble.

        WHEREAS: On Linux below, recent vintage instance OS's like 23.10 and 24.04 pre-releases work:

root@box:~# multipass find
Image                       Aliases           Version          Description
core                        core16            20200818         Ubuntu Core 16
core18                                        20211124         Ubuntu Core 18
core20                                        20230119         Ubuntu Core 20
core22                                        20230717         Ubuntu Core 22
20.04                       focal             20231129         Ubuntu 20.04 LTS
22.04                       jammy,lts         20231211         Ubuntu 22.04 LTS
23.04                       lunar             20231205         Ubuntu 23.04
23.10                       mantic            20231220         Ubuntu 23.10
daily:24.04                 noble,devel       20240101         Ubuntu 24.04 LTS
appliance:adguard-home                        20200812         Ubuntu AdGuard Home Appliance
appliance:mosquitto                           20200812         Ubuntu Mosquitto Appliance
appliance:nextcloud                           20200812         Ubuntu Nextcloud Appliance
appliance:openhab                             20200812         Ubuntu openHAB Home Appliance
appliance:plexmediaserver                     20200812         Ubuntu Plex Media Server Appliance

Blueprint                   Aliases           Version          Description
anbox-cloud-appliance                         latest           Anbox Cloud Appliance
charm-dev                                     latest           A development and testing environment for charmers
docker                                        0.4              A Docker environment with Portainer and related tools
jellyfin                                      latest           Jellyfin is a Free Software Media System that puts you in control of managing and streaming your media.
minikube                                      latest           minikube is local Kubernetes
ros-noetic                                    0.1              A development and testing environment for ROS Noetic.
ros2-humble                                   0.1              A development and testing environment for ROS 2 Humble.

Related:

holta commented 10 months ago

great to get a better understanding of what exactly triggers the problem and if there is a way to work around it

  1. @ricab @georgeliao I agree: a workaround would be truly incredible — e.g. shout if you happen to come up with any interim tricks for instances that in fact need to modify /etc/hostname on Windows... and also on macOS possibly!?

      💯

georgeliao commented 10 months ago

Just FYI I tried many times and cannot reproduce @georgeliao's above claim:

Every Internet-in-a-Box instance fails to reboot (on a Windows 11 Host PC) regardless whether omg.yml variables calibreweb_install and calibreweb_enabled are set to True or False.

Hmm, interesting, that is different from my case. However, based on your later comments, I assume launching without --cloud-init omg.yml and rebooting work, right?

Simple commands like multipass start and multipass shell fail every time: (on Windows 11 Host PC, even after I restart "C:\Program Files\Multipass\bin\multipassd.exe" /svc --verbosity debug using Start > Run > services.msc which can take fully ~5 minutes!)

This is surprising, what multipass start does is just launch a primary instance first and start it. The error message says there is another process uses ubuntu-22.04-server-cloudimg-amd64.vhdx file. This is a different error from the IP address one. Not sure what hyper-v is doing. By the way, this error does not occurs on my machine.

georgeliao commented 10 months ago

WHEREAS: Changing /etc/hostname inside an instance works when the Host PC is Linux!

This is the same as the behavior on my machine, both windows and linux. I think this comes down to how the particular backend handles this.

SIDE QUESTION: Does changing /etc/hostname work when the Host PC is macOS ?

On macos, the behavior is the same as Linux when the backend is qemu.

holta commented 10 months ago

This is surprising, what multipass start does is just launch a primary instance first and start it.

That's what we all assumed 😄

Reality is obviously different: https://github.com/canonical/multipass/issues/3346#issuecomment-1876042308

I assume launching without --cloud-init omg.yml and rebooting work, right?

Until you modify /etc/hostname in the instance.

After that you can never reboot the instance again.

On macos, the behavior is the same as Linux when the backend is qemu.

Thanks for clarifying:

In summary it's only Windows Host PCs where Multipass instances ( effectively 😢 ) self-destruct when /etc/hostname is modified.

(A workaround would be priceless if others can suggest something! Given that modifying /etc/hostname e.g. using hostnamectl is routine systems administration on Linux.)

townsend2010 commented 10 months ago

I will add some words around the changing of /etc/hostname. We use \<instance-name>.mshome.net to resolve the address of the instance. Windows' built-in DHCP/DNS for Hyper-V uses ICS which allows resolving this way. We decided to do this because the subnet that is chosen for ICS can change between host reboots and there is no way to probe the IP address the virtual machines are on hence the use of .mshome.net for reliably connecting to the instance. However, if /etc/hostname is being changed within the instance, it will have a new hostname registered with ICS. There is no way Multipass can be aware of the change in hostname and as such, it can no longer connect to the instance.

georgeliao commented 10 months ago

This is surprising, what multipass start does is just launch a primary instance first and start it. That's what we all assumed 😄 Reality is obviously different: https://github.com/canonical/multipass/issues/3346#issuecomment-1876042308

Btw, regarding this particular error, @holta , can you move that error text into another issue? So it will be easier for us to track.

ADVANCED QUESTION: Why are multipass launch 23.10 and multipass launch 24.04 blocked on Windows and macOS, while recent OS's can be used when the Host PC is Linux? Can Multipass please consider fixing this?

You are using multipass 1.12.2 and 1.13.0 based on your title description. I think 1.13.0 fixed that already.

Until you modify /etc/hostname in the instance.

Changing /etc/hostname will have a consequence as Townsend mentioned. By the way, I am missing the thought process here. You might be looking for a workaround, but why did it lead to changing that file?

townsend2010 commented 10 months ago

I'm changing the title to this issue since it's not Multipass that damages the hosts.ics file.

holta commented 10 months ago

This is surprising, what multipass start does is just launch a primary instance first and start it. That's what we all assumed 😄 Reality is obviously different: #3346 (comment)

Btw, regarding this particular error, @holta , can you move that error text into another issue? So it will be easier for us to track.

Done:

ADVANCED QUESTION: Why are multipass launch 23.10 and multipass launch 24.04 blocked on Windows and macOS, while recent OS's can be used when the Host PC is Linux? Can Multipass please consider fixing this?

You are using multipass 1.12.2 and 1.13.0 based on your title description. I think 1.13.0 fixed that already.

That would be great if it starts working in coming weeks, as a result of #3274?

Just FYI it's not yet working with 1.13.0 RC on Windows as documented here: https://github.com/canonical/multipass/issues/3346#issuecomment-1876127606

By the way, I am missing the thought process here. You might be looking for a workaround, but why did it lead to changing that file?

Internet-in-a-Box instances always change /etc/hostname as that's what the schools around the world need.

So thanks to @townsend2010's very helpful clarifications it seems we need to find a way to alert Windows of the instance's new internal name e.g. /etc/hostname whenever it changes (often!)

Even if it's kludgy hack initially 😆 (in the hope that Multipass on Windows will in future allow everyone to rename instances ideally, to correspond to their "internal name" ?) 🙏

townsend2010 commented 10 months ago

it seems we need to find a way to alert Windows of the instance's new internal name e.g. /etc/hostname whenever it changes (often!)

To clarify a bit further, Windows is certainly aware of the hostname change, but Multipass is not which is what breaks this.

Say, for example, an instance was created with the name foo. foo is then initially set to the hostname and when the instance is started, Multipass tries connecting to foo.mshome.net and when the instance is up and running, everything is good.

Then let's say /etc/hostname is changed to bar. When the instance is next started, the instance will ask the Windows ICS for an IP address with hostname bar. At this point bar.mshome.net would resolve correctly, but since Multipass is completely unaware of any changes done inside instances, it will still try to connect to foo.mshome.net because foo is technically still the name of the instance.

The only answer here is fix things to allow changing instance names through Multipass, which we definitely want to do and can hopefully fix this year. The current work be doing with the clone feature to rename a cloned instance should be able to be used to rename an instance as well. The problem up to this point is that we made a bad decision in the early days of Multipass to key everything off of the name of the instance and made many assumptions around that. If we would have been smarter, we should have used a UUID. All that said, for changes to /etc/hostname to work with Multipass, one would need to issue the command to rename the instance through the client and not modify it internally to the instance or else we will still be in the same situation.

Regarding the RC not fixing the image issue, I uploaded a new RC yesterday that should include the fix for that. Could you please confirm you've download and installed the latest 1.13.0 RC? Thanks!

holta commented 10 months ago

answer here is fix things to allow changing instance names through Multipass, which we definitely want to do and can hopefully fix this year

Fantastic news: please consider offering experimental pre-releases on Windows in 2024 if you can!

Regarding the RC not fixing the image issue, I uploaded a new RC yesterday that should include the fix for that. Could you please confirm you've download and installed the latest 1.13.0 RC? Thanks!

Just FYI it doesn't yet work:

C:\Users\XXXX>multipass launch 24.04
launch failed: '24.04' is not a supported alias. Please use `multipass find` for supported image aliases.

C:\Users\XXXX>multipass version
multipass   1.13.0-rc.1313+g66f773628.win
multipassd  1.13.0-rc.1313+g66f773628.win

C:\Users\XXXX>multipass find
Image                       Aliases           Version          Description
core                        core16            20200818         Ubuntu Core 16
core18                                        20211124         Ubuntu Core 18
core20                                        20230119         Ubuntu Core 20
core22                                        20230717         Ubuntu Core 22
20.04                       focal             20231129         Ubuntu 20.04 LTS
22.04                       jammy,lts         20231211         Ubuntu 22.04 LTS
23.10                       mantic            20231220         Ubuntu 23.10
appliance:adguard-home                        20200812         Ubuntu AdGuard Home Appliance
appliance:mosquitto                           20200812         Ubuntu Mosquitto Appliance
appliance:nextcloud                           20200812         Ubuntu Nextcloud Appliance
appliance:openhab                             20200812         Ubuntu openHAB Home Appliance
appliance:plexmediaserver                     20200812         Ubuntu Plex Media Server Appliance

Blueprint                   Aliases           Version          Description
anbox-cloud-appliance                         latest           Anbox Cloud Appliance
charm-dev                                     latest           A development and testing environment for charmers
docker                                        0.4              A Docker environment with Portainer and related tools
jellyfin                                      latest           Jellyfin is a Free Software Media System that puts you in control of managing and streaming your media.
minikube                                      latest           minikube is local Kubernetes
ros-noetic                                    0.1              A development and testing environment for ROS Noetic.
ros2-humble                                   0.1              A development and testing environment for ROS 2 Humble.

QUESTION: How does one find pre-releases of Multipass for Windows like the one you uploaded yesterday?

Are these pre-releases perhaps accidentally hidden? So far I do not see how to find these pre-releases even ~24h later using:

townsend2010 commented 10 months ago

Oh, right, 24.04 is a development release and is only available on Linux ATM. On Windows and macOS, we've never made the current development release available. This restriction is not a decision of the Multipass team and comes from higher up the decision chain since day 1 of Multipass.

Regarding pre-releases, those will be in https://github.com/canonical/multipass/releases when we make them available. You will see a heading called "Assets" near the bottom of the particular release and if you click to expand that, you will see the packages you can download. The fact that it is "hidden" there is a Github thing.

holta commented 10 months ago

Oh, right, 24.04 is a development release and is only available on Linux ATM. On Windows and macOS, we've never made the current development release available. This restriction is not a decision of the Multipass team and comes from higher up the decision chain since day 1 of Multipass.

Ah, very regrettable but ok!

Regarding pre-releases, those will be in https://github.com/canonical/multipass/releases when we make them available. You will see a heading called "Assets" near the bottom of the particular release and if you click to expand that, you will see the packages you can download. The fact that it is "hidden" there is a Github thing.

Finding the "Assets" heading was not the issue just FYI: what made this very confusing was that https://github.com/canonical/multipass/releases shows "Dec 7, 2023" prominently. Now I understand that date is a misleading trap:

image

(For others, maybe consider posting images like rc.1308 and rc.1313 more prominently as RC1, RC2 or some such in future!)

townsend2010 commented 10 months ago

Ah, ok, I see, the date is not updated to reflect recent changes. That's a good idea about versioning the RC's. I will consider that for the next time.

townsend2010 commented 10 months ago

The issue tracking the renaming of instances is #255.

Aside from the changing of the host name internally in the instance and the fall out from that, are there any other issues in here that are still unresolved?

holta commented 10 months ago

any other issues in here that are still unresolved?

All set and thanks for your help!

Oh, right, 24.04 is a development release and is only available on Linux ATM

PS Thank you for explaining the larger context. Higher ups are deterring some valuable community testing of Ubuntu 24.04. Let us hope they one day reconsider this unhealthy decision.

Whether or not 2024 is officially now the "Year of the Linux Desktop" 😄

townsend2010 commented 10 months ago

Higher ups are deterring some valuable community testing of Ubuntu 24.04. Let us hope they one day reconsider this unhealthy decision.

I will bring this to their attention soon :slightly_smiling_face:

Whether or not 2024 is officially now the "Year of the Linux Desktop" 😄

:rocket:

holta commented 10 months ago

That's a good idea about versioning the RC's. I will consider that for the next time.

@townsend2010

Would a weekly-or-similar CI/CD pipeline creating all 3 Multipass images {Windows, macOS, Linux} potentially be plausible in future years?

(Just an idea, that I fully realize takes a lot of work to get right & nurture along the way, whatever pacing makes most sense! ;-)

townsend2010 commented 10 months ago

Hey @holta,

Would a weekly-or-similar CI/CD pipeline creating all 3 Multipass images {Windows, macOS, Linux} potentially be plausible in future years?

We already do this for Linux. The edge Snap channel follows the tip of main. For macOS and Windows, this is already kind of done, but the problem here is how to make the packages more publicly visible. We publish them to an S3 bucket now, but I need to think of a way of how to annotate something automatically that makes it easy for end users such as yourself to download them. I will have to give this part some more thought.