canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.88k stars 651 forks source link

Upgraded to 13.1 on MacOs Sonoma and an no longer get extra IP adduces - lost entire dev environment #3486

Closed BainMcKay closed 6 months ago

BainMcKay commented 7 months ago

Describe the bug Describe what your problem is.

To Reproduce How, and what happened?

  1. multipass ...

Expected behavior What did you expect to happen?

Logs Please provide logs from the daemon, see accessing logs on where to find them on your platform.

Additional info

Additional context Add any other context about the problem here.

BainMcKay commented 7 months ago

upgraded to Multipass 13.1. I had several VM Instances. No existing VMs can get extra IP addresses so they won't start, My entire product development environment is locked up

Tried turning off the firewall. Edited dulicates out of dhcpd_leasaes

posted booted and com.aple.Virtualization.VirtualMachine as firewardd exceptions so they aren't blocked

Here is ../multipassd.log

[2024-04-14T00:24:26.738] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug [2024-04-14T00:24:26.738] [info] [update] A New Multipass release is available: 1.13.1 [2024-04-14T00:24:26.738] [warning] [RAG] qemu-system-aarch64: -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda: qcow2: Image is corrupt; cannot be opened read/write

[2024-04-14T00:24:26.738] [warning] [qemu-system-aarch64] [2024-04-14T00:24:26.738] [info] [RAG] process state changed to NotRunning [2024-04-14T00:24:26.738] [info] [RAG] process finished with exit code 1 [2024-04-14T00:25:05.652] [debug] [base_vm] Error getting extra IP addresses: ssh connection failed: 'Timeout connecting to 192.168.64.55'

I have nothing else to try. I don;t think the image is corrupt. same issue with the backup image

Can anyone provide some guidance?

andrei-toterman commented 7 months ago

Hi, @BainMcKay! I'm sorry that you're having all of this trouble with Multipass. The logs do say that the image is corrupt, so maybe it's worth trying to repair it. Try running the following command:

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/RAG/ubuntu-22.04-server-cloudimg-arm64.img
BainMcKay commented 7 months ago

Thank you for this. I ran the corruption cleanup and here ids the tail end of the console log

THE COMMAND bainmckay@BainsMacStudio / % sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/RAG/ubuntu-22.04-server-cloudimg-arm64.img

...

TAIL OF THE CONSOLE LOG The following inconsistencies were found and repaired:

78771 leaked clusters
54533 corruptions

Double checking the fixed image now... No errors were found on the image. 777313/1310720 = 59.30% allocated, 4.87% fragmented, 0.34% compressed clusters Image end offset: 52967178240

=====

Before this, any instance that I created with the new release work. But instances with the previous version I had created did not. They could not get an IP address for Extra Network interface IPs. the Instances default to 192.XXx.XX.XX but my local DNS is 10.0.1.XXX. So 10.0.1.XXx (en0) is the gateway IP.

    "extra_interfaces": [
        {
            "auto_mode": true,
            "id": "en0",
            "mac_address": "52:54:00:18:a7:0f"
        },
        {
            "auto_mode": true,
            "id": "en1",
            "mac_address": "52:54:00:54:2f:2a"
        }
    ],

I had created an instance called [test] with the new version, and in trying to get it to work, I added a Bridge Network . That caused the Gateway network for [test] to suffer the same inability to get the IP for the Extract Network Interface . However, when I ran your de-corruption command (above), and I removed the extra interface from the above Json, [test] once again became available without the extra IP address. I had removed it from the Json config of extra interfaces, since it was getting stuck on that IP. But that did not work for instance created with the prior Multipass version, which have my dev code. I tried all things. turning off automatic firewall exceptions and forcing them with commands,

MANUAL REGISTER AND UNBLOCK EXCEPTIONS TO MAC FIREWALL bootpd /usr/libexec/ApplicationFirewall/socketfilterfw --add /usr/libexec/bootpd /usr/libexec/ApplicationFirewall/socketfilterfw --unblock /usr/libexec/bootpd

launchd cd / sudo find . -name "launchd" sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add ./sbin/launchd sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock ./sbin/launchd

multipassd sudo find . -name "multipassd" /usr/libexec/ApplicationFirewall/socketfilterfw --add ./System/Volumes/Data/private/var/root/Library/Application Support/multipassd /usr/libexec/ApplicationFirewall/socketfilterfw --unblock ./System/Volumes/Data/private/var/root/Library/Application Support/multipassd

That did not seem to have an effect

In the Instance that matters [RAG-WEB4], where the code I am missing is in development, it complains about SSH not being up. And support docs suggest this is a network compatibility issue. Not sure why or what the means.

The other event that happened, in addition to upgrading to the latest Multipass version, is the the host machine [M1 Apple Studio Ultra] had a firmware update. That may have had some effect as well. I have copies of all VMs in a backup site (ISO,IMG) so I can copy them back in to resolve any corruptions that happen.

So that's where I am. It appears, anything from the prior release seems to have a network compatibility issue, now that the corruptions (appear) to be fixed.

Finally, here is the multipass log from me tying to start the critical instance [RAG-WEB4] . Maybe you can see something in the log that will help me recover the instance - to get it working again.

=========================== MULTIPASSD LOG (TAIL) [2024-04-16T08:55:09.860] [debug] [RAG-WEB4] process working dir '' [2024-04-16T08:55:09.860] [info] [RAG-WEB4] process program 'qemu-system-aarch64' [2024-04-16T08:55:09.860] [info] [RAG-WEB4] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:e3:11:70, -nic, vmnet-bridged,ifname=en0,model=virtio-net-pci,mac=52:54:00:18:a7:0f, -nic, vmnet-bridged,ifname=en1,model=virtio-net-pci,mac=52:54:00:54:2f:2a, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 4, -m, 8192M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/cloud-init-config.iso' [2024-04-16T08:55:09.867] [debug] [qemu-system-aarch64] [1600] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.cTWZRg [2024-04-16T08:55:09.919] [info] [RAG-WEB4] process state changed to Starting [2024-04-16T08:55:09.920] [info] [RAG-WEB4] process state changed to Running [2024-04-16T08:55:09.920] [debug] [qemu-system-aarch64] [1601] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e3:11:70 -nic vmnet-bridged,ifname=en0,model=virtio-net-pci,mac=52:54:00:18:a7:0f -nic vmnet-bridged,ifname=en1,model=virtio-net-pci,mac=52:54:00:54:2f:2a -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 4 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/cloud-init-config.iso [2024-04-16T08:55:09.921] [info] [RAG-WEB4] process started [2024-04-16T08:55:09.922] [debug] [RAG-WEB4] Waiting for SSH to be up [2024-04-16T08:55:10.528] [debug] [RAG-WEB4] QMP: {"QMP": {"version": {"qemu": {"micro": 1, "minor": 2, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2024-04-16T08:55:10.549] [debug] [RAG-WEB4] QMP: {"return": {}}

[2024-04-16T08:55:55.720] [debug] [RAG] Resetting the network

andrei-toterman commented 6 months ago

Hey, @BainMcKay! One thing you could try is to run start the VM manually using the qemu CLI. Try this

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e3:11:70 -nic vmnet-bridged,ifname=en0,model=virtio-net-pci,mac=52:54:00:18:a7:0f -nic vmnet-bridged,ifname=en1,model=virtio-net-pci,mac=52:54:00:54:2f:2a -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 4 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/cloud-init-config.iso

This should bring up a window in which you can see the VM boot process. Does it get stuck? Does it arrive at a login screen?

One possible source of these corruptions could be macOS itself. When you shutdown your machine, macOS does not wait for Multipass to properly suspend the VMs and it just kills them immediately.

In the future, you could use the new Snapshots feature in Multipass to help mitigate the risk of losing access to the VM.

BainMcKay commented 6 months ago

Thank you Andrei

Could not open '/Library/Application': No such file or directory

And it doesn’t.

I need to step away from my desk for 1 hour, I’ll look at the command more in-depth when I return, to map it to my Sonoma implementation, should there be a difference.

Please feel free to make additional suggestions.

Bain

On Apr 19, 2024, at 8:27 AM, Andrei Toterman @.***> wrote:

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e3:11:70 -nic vmnet-bridged,ifname=en0,model=virtio-net-pci,mac=52:54:00:18:a7:0f -nic vmnet-bridged,ifname=en1,model=virtio-net-pci,mac=52:54:00:54:2f:2a -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 4 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/RAG-WEB4/cloud-init-config.iso

andrei-toterman commented 6 months ago

Make sure you have escaped the space properly. The directory name is Application Support, so when trying to run a command form it, you need to either escape the space: Application\ Support; or quote the path: "Application Support".

BainMcKay commented 6 months ago

yes - there were 3 escapes required. Tx

On submitting and logging in, I get the following screenshot

Pasted Graphic
andrei-toterman commented 6 months ago

Hey, @BainMcKay! That window should have a 'View' menu from which you should be able to select VGA instead of parallel0. Could you please try again and make sure VGA is selected?

BainMcKay commented 6 months ago

Hi Andrea,

I just lost the gateway IP on my main Multipass dev instance again.

I am able to copy the code out because I am still logged in. But my VSCode edit connection is gone, no access to the code, and no 10.X IP to log it in again. It’s not available inside the VM (ip a) or outside with multipass list.

So there is still something going on with gateway IP disappearing, and not getting it back.

Hopefully we can figure it out.

Bain

On Apr 22, 2024, at 7:52 AM, Bain McKay @.***> wrote:

Hi Andrei

Only Parallel is available (see picture below)

> On Apr 22, 2024, at 5:01 AM, Andrei Toterman ***@***.***> wrote: > > > Hey, @BainMcKay ! That window should have a 'View' menu from which you should be able to select VGA instead of parallel0. Could you please try again and make sure VGA is selected? > > — > Reply to this email directly, view it on GitHub , or unsubscribe . > You are receiving this because you were mentioned. >
BainMcKay commented 6 months ago

I rebooted, and got the gateway IPs back for all VMs created in the current version of multipass.Still can’t open VMs created with the previous version of multipass. I now know when it's about to drop the Gateway IP. VSCode keeps reconnecting. So there is a progressive instability taking place with the gate IP I look forward to further guidance.On Apr 22, 2024, at 12:06 PM, Bain McKay @.> wrote:Hi Andrea,I just lost the gateway IP on my main Multipass  dev instance again. I am able to copy the code out because I am still logged in. But my VSCode edit connection is gone, no access to the code, and no 10.X IP to log it in again. It’s not available inside the VM (ip a) or outside with multipass list.So there is still something going on with gateway IP disappearing, and not getting it back. Hopefully we can figure it out.BainOn Apr 22, 2024, at 7:52 AM, Bain McKay @.> wrote:Hi AndreiOnly Parallel is available (see picture below)On Apr 22, 2024, at 5:01 AM, Andrei Toterman @.> wrote:Hey, @BainMcKay! That window should have a 'View' menu from which you should be able to select VGA instead of parallel0. Could you please try again and make sure VGA is selected?—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.>

BainMcKay commented 6 months ago

Andrea,

Is there as way we can continue the above debug thread, where I did not get the VGA menu, but only the Parallel0 menu. My agent code remains locked up on that VM. I need to find a way to recover it.

Bain

andrei-toterman commented 6 months ago

Unfortunately I can't understand why you keep losing access to the VM. Do you absolutely need to be able to shell into that VM or do you only need access to the files inside it? If its the latter, it should be possible to open the image file and browse its contents and retrieve the files that you want without actually running that VM. I can provide you with instructions on how to do it. Would that suffice for your case?

BainMcKay commented 6 months ago

Hi Andrie,To clarify, I have not regained access to any of my Multipass VMs from the previous Multipass version. So I don’t keep loosing access to them. I never got it back. Other than the black screen with parrallel0 menu item I opened with your long command line technique.That said, I did loose access to the VMs  in this version. I rebooted and got them back,But yes, I just need to get inside the image. I don't need to run it.Finally, given my total commitment to Multipass, I’d love know as much as I can, so I can recover if there is a next time. Is there a reason why the VDM menu item was missing in the UI I was able to open  in the previous version VM?Much appreciate all your help.BainBain McKay CEO & Chief ScientistKayvium @.(613)668-9979                                             KAYVIUMBeyond Search - Intelligent Applications that work the way you think                   THE CONTENTS OF THIS EMAIL ARE COPYRIGHT KAYVIUM CORPORATION 2005-2024 - ALL RIGHTS RESERVED. THIS EMAIL IS NOT TO BE FORWARDED OR SHARED WITHOUT THE PERMISSION OF KAYVIUM CORPORATIONSent from my iPhoneOn Apr 24, 2024, at 3:58 AM, Andrei Toterman @.> wrote: Unfortunately I can't understand why you keep losing access to the VM. Do you absolutely need to be able to shell into that VM or do you only need access to the files inside it? If its the latter, it should be possible to open the image file and browse its contents and retrieve the files that you want without actually running that VM. I can provide you with instructions on how to do it. Would that suffice for your case?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

andrei-toterman commented 6 months ago

Ok, so in order to mount the qcow2 image and retrieve the files inside it, you need access to a Linux machine. This could also just be another functional Multipass instance. It just needs to have enough disk space so that you can copy the ubuntu-22.04-server-cloudimg-arm64.img inside it.

Now, copy that image file (IIUC this was for the RAG-WEB4 instance) to your Linux machine. In case you're using another Multipass instance, you can use multipass transfer ubuntu-22.04-server-cloudimg-arm64.img your-instance:.

Then get on a terminal on that Linux machine and do the following:

Now inside that directory you should be able to access all the files that were on the broken VM.

BainMcKay commented 6 months ago

Very cool. Much appreciated Andrei

I used scp to copy the image file into a new multipass instance I created for the purpose.

I installed qemu-utils and ran modprobe

I created the instance folder, and mounted /dev/nbd0p1 to instance-files

cd ../instance-files, and there we all my agent projects.

Beautiful.

Much appreciate all your time, effort and support on this.

I am fully committed to Multipass. A big fan, from my usage experience to date. I do AI product dev. And IAAS2 autogeneration of instances (no technical debt). it’s been rocky to this point, but impossible to go back to clunky VMware et al. I was always able to get thru it, knowing its was just a matter of time before Multipass matured and had the resilience I needed. I drank the cool-aid. Happy the knife edge has been significantly blunted with this procedure. Ubuntu Cloud on my local dev Mac network is awesome. I hate all the infrastructure you need to drag around for VMWare and other VM tools, even Docker. I love the bare metal multipass light footprint, the speed, the agility, the power and the performance. And with snapshots on the latest version, hopefully the resilience.

Keep it going. Would love to be able to xfer images from M[1,2,3] to Intel Macs. And to xfer them to DIgitalOcean, AmazonAWS , Azure and Google Cloud, as hybrid cloud. With IAAS2 automation, all the technical debt of building and tearing down VMs in hybrid cloud goes away with ability to track 100s of VMs using custom Grafana dashboards.

Bain

On Apr 24, 2024, at 8:24 AM, Andrei Toterman @.***> wrote:

Ok, so in order to mount the qcow2 image and retrieve the files inside it, you need access to a Linux machine. This could also just be another functional Multipass instance. It just needs to have enough disk space so that you can copy the ubuntu-22.04-server-cloudimg-arm64.img inside it.

Now, copy that image file (IIUC this was for the RAG-WEB4 instance) to your Linux machine. In case you're using another Multipass instance, you can use multipass transfer ubuntu-22.04-server-cloudimg-arm64.img your-instance:.

Then get on a terminal on that Linux machine and do the following:

install some qemu tools: sudo apt install qemu-utils enable NBD support: sudo modprobe nbd max_part=16 connect the image you copied to an nbd device: qemu-nbd --connect=/dev/nbd0 ubuntu-22.04-server-cloudimg-arm64.img make a directory and mount the partition from that NBD device to that directory: mkdir instance-files && sudo mount /nbd0p1 instance-files Now inside that directory you should be able to access all the files that were on the broken VM.

— Reply to this email directly, view it on GitHub https://github.com/canonical/multipass/issues/3486#issuecomment-2075076676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARINNKZSZJLMKPDW7QHSKTY666BHAVCNFSM6AAAAABGGA43B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGA3TMNRXGY. You are receiving this because you were mentioned.

andrei-toterman commented 6 months ago

I'm glad that it worked! Again, I'm sorry for all the trouble and for not being able to figure out what was the problem with your original instance. Happy to hear that Multipass is such an useful tool for you! Let me add that in the future, if it fits your workflow, you can also use multipass mount to make a local directory available in your instance, so that way you'll never lose access to the files in case an instance breaks. Don't hesitate to ask for help again!

BainMcKay commented 6 months ago

Thank you Andrei,

Yes - I've recorded the procedure in a support ticket, so I can use it for VM recovery should it come to that again.

Enjoy your day...

Bain

On Apr 24, 2024, at 10:04 AM, Andrei Toterman @.***> wrote:

I'm glad that it worked! Again, I'm sorry for all the trouble and for not being able to figure out what was the problem with your original instance. Happy to hear that Multipass is such an useful tool for you! Let me add that in the future, if it fits your workflow, you can also use multipass mount to make a local directory available in your instance, so that way you'll never lose access to the files in case an instance breaks. Don't hesitate to ask for help again!

— Reply to this email directly, view it on GitHub https://github.com/canonical/multipass/issues/3486#issuecomment-2075302406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARINNPPNS4WWUBB2PDDZX3Y67JXHAVCNFSM6AAAAABGGA43B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGMYDENBQGY. You are receiving this because you were mentioned.