canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.77k stars 642 forks source link

ssh connection failed: 'Connection refused' - NIC_RX_FILTER_CHANGED Cannot access instance #3071

Open JackieTreeh0rn opened 1 year ago

JackieTreeh0rn commented 1 year ago

Describe the bug Cannot connect to instance. It was working fine until today. Now takes forever to start the instance and Multipass doesn't seem to register the state of the VM, always shows as running on menu. Issue is networking. It only shows internal interface 192.168.x It really sucks networking is so fragile with multipass as I truly love the small footprint and flexibility.

MacOS Firewall OFF

Have tried resetting local.driver to qemu, no difference.

To Reproduce How, and what happened?

  1. I install multipass with the following network flags -network name=en0,mac="52:54:00:32:c3:b4 to associate it with my wireless NIC. The mac addressed being passed is for the virtual NIC in the VM instance, in this case, enp0s2. I am forcing this MAC address on the instance because I have a static DHCP reservation configured for it.
  2. I am using the Docker | Portainer blueprint
  3. I also install avahi-daemon avahi-discover avahi-utils libnss-mdns mdns-scan for .local resolution.

Expected behavior This worked fine for weeks now it doesn't.

Logs

Screenshot 2023-05-12 at 2 48 38 AM

[debug] [GALAXY-DOCK] QMP: {"timestamp": {"seconds": 1683851504, "microseconds": 962322}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[13]/virtio-backend"}}

[2023-05-11T20:36:24.608] [debug] [GALAXY-DOCK] Resetting the network

[2023-05-11T20:36:24.610] [debug] [GALAXY-DOCK] QMP: {"return": {}}

[2023-05-11T20:36:24.621] [debug] [GALAXY-DOCK] QMP: {"return": {}}

[2023-05-11T22:44:25.178] [debug] [base_vm] Error getting extra IP addresses: ssh connection failed: 'Connection refused'

[2023-05-11T22:50:42.393] [debug] [base_vm] Error getting extra IP addresses: ssh connection failed: 'Connection refused'

Additional info

Additional context Recently connected to a new LG monitor and dock that has an ethernet port. Haven't tested yet without the dock as I'd like to understand the issue given as how networking seems so fragile from one use case to another, I would like to make my config as portable as possible without having to wipe out and redeploy so often.

andrei-toterman commented 1 year ago

Hey, @fsck66! The fact that you get a Connection refused should mean that the instance is starting, but something prevents sshd from accepting connections. Did you install any additional software right before this issue occurred or did any other configurations that might've affected it? If not, please provide the output of running the following command

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application\ Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:2f:7b:9f -nic vmnet-bridged,ifname=en0,model=virtio-net-pci,mac=52:54:00:2f:d1:14 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application\ Support/multipassd/qemu/vault/instances/GALAXY-DOCK/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 8 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -cdrom /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/GALAXY-DOCK/cloud-init-config.iso -loadvm suspend -machine virt-7.1

which should open up a window with a console into the instance.

JackieTreeh0rn commented 1 year ago

unfortunately I have by now redeploy the instance and multipass on the problem system. What will this qemu command do? does it launch a session into the VM no matter what, does it change anything?

Looking back I believe this occurred after upgrading UTM, which I rarely use, but I also have on my ARM apple silicon system to emulate x86 architecture on VM that doesn't have an ARM version. I know that also uses QEMU but I already had it loaded at the time and used in in conjunction with multipass. Would that have anything to do with it or do these solutions, including multipass, maintain their own QEMU? thanks

townsend2010 commented 1 year ago

Hi @fsck66!

As @andrei-toterman mentioned, we would like to see the output of the console that pops up when running the qemu command. Looking at it though, you need to add /Library/Application\ Support/com.canonical.multipass/bin before the qemu command. This is to help us diagnose if the instance is fully booting or if it's perhaps getting stuck loading a system service. Nothing will change anything by running that.

In essence, what the Connection refused tells us is that the instance has an IP address, but Multipass's attempt to connect to the instance via ssh is being refused by the instance which leads to the question, why is it refusing access? A few things may do that like the ssh config in the instance has been modified and is now refusing access, or sshd is trying to start and is stuck, or some software like a firewall in the instance is blocking access to port 22. We're hoping the console output may rule out some things.

Given these symptoms, it's highly unlikely UTM's qemu is causing issues with Multipass and seems to just be a coincidence.

JackieTreeh0rn commented 1 year ago

Hi,

I am not sure I can help track my problem anymore since I've since redeployed and reconfigured my instance via netplan and so my config is different now. I did run the command just for sakes (I changed the new Mac addresses to reflect the new ones in the command everything else is the same) and it returns:

{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 7}, "package": ""}, "capabilities": ["oob"]}} qemu-system-aarch64: Snapshot 'suspend' does not exist in one or more devices

upngo commented 1 year ago

I ran into a similar issue when upgrading to 1.12.0 For a while it was Waiting for SSH to be up After reading about all the firewall issues I tried adding everything in multipass bin to the firewall 'accept incoming' list. killing sudo launchctl stop com.openssh.sshd and restarting. This got me to the a similar state as here.. The start or launch commands both end up timing out and the last thing in the logs is

[2023-07-03T08:24:10.952] [debug] [new-sunbird] QMP: {"timestamp": {"seconds": 1688329450, "microseconds": 952046}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}

I uninstalled again and reverted back to 1.11.1 and things are working....so perhaps downgrading would work for someone in the same position...

IknowJoseph commented 9 months ago

I'm suffering the same issue on my M1 Mac running OS 14.2.1 and Multipass 1.12.0

I am not aware of any changes to host or guests, but one day the guest stopped booting. I ended up removing Multipass and reinstalling, creating a new VM, but get the same error. VMs are always in an unknown state and the last log entry reads NIC_RX_FILTER_CHANGED

I have uninstalled again and installed the 1.13 RC but get the same. Interestingly, the issue was originally reported on 1.11 and the poster above suggests the same happens on 1.12, but was fixed by reverting to a previous version. I have not gone back, but the fact that OP suffered on 1.11 suggests this may not be a reliable fix.

How does the issue survive an uninstall / reinstall? I'm purging config when prompted by the uninstall script.

Log entry of 1.13 attempt:

`[2024-01-03T15:43:42.847] [debug] [update] Latest Multipass release available is version 1.12.2 [2024-01-03T15:43:43.059] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1 [2024-01-03T15:43:43.061] [debug] [blueprint provider] Loading "charm-dev" v1 [2024-01-03T15:43:43.062] [debug] [blueprint provider] Loading "docker" v1 [2024-01-03T15:43:43.062] [debug] [blueprint provider] Loading "jellyfin" v1 [2024-01-03T15:43:43.063] [debug] [blueprint provider] Loading "minikube" v1 [2024-01-03T15:43:43.064] [debug] [blueprint provider] Loading "ros-noetic" v1 [2024-01-03T15:43:43.064] [debug] [blueprint provider] Loading "ros2-humble" v1 [2024-01-03T15:43:43.073] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket [2024-01-03T15:43:43.073] [debug] [async task] fetch manifest periodically [2024-01-03T15:43:43.075] [info] [daemon] Starting Multipass 1.13.0-rc.1308+g240e6cae1.mac [2024-01-03T15:43:43.075] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug [2024-01-03T15:43:44.049] [info] [VMImageHost] Did not find any supported products in "appliance" [2024-01-03T15:44:22.788] [debug] [qemu-system-aarch64] [11729] started: qemu-system-aarch64 --version [2024-01-03T15:47:08.374] [debug] [image vault] Verifying hash "f885a8e8f62ab2c39ab0442ea182b69d49ccd990d24791acb4f1724573d8120f" [2024-01-03T15:47:13.547] [debug] [qemu-img] [11754] started: qemu-img info --output=json /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20231211/ubuntu-22.04-server-cloudimg-arm64.img [2024-01-03T15:47:13.835] [debug] [qemu-img] [11755] started: qemu-img amend -o compat=1.1 /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20231211/ubuntu-22.04-server-cloudimg-arm64.img [2024-01-03T15:47:13.855] [debug] [qemu-img] [11756] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20231211/ubuntu-22.04-server-cloudimg-arm64.img [2024-01-03T15:47:13.868] [debug] [qemu-img] [11757] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/ubuntu-22.04-server-cloudimg-arm64.img 5368709120 [2024-01-03T15:47:13.882] [debug] [qemu-img] [11758] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/ubuntu-22.04-server-cloudimg-arm64.img [2024-01-03T15:47:13.890] [debug] [qemu-img] [11759] started: qemu-img amend -o compat=1.1 /var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/ubuntu-22.04-server-cloudimg-arm64.img [2024-01-03T15:47:13.896] [debug] [trim-katydid] process working dir '' [2024-01-03T15:47:13.896] [info] [trim-katydid] process program 'qemu-system-aarch64' [2024-01-03T15:47:13.896] [info] [trim-katydid] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:90:87:9c, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/cloud-init-config.iso' [2024-01-03T15:47:13.898] [debug] [qemu-system-aarch64] [11760] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.IBhJap [2024-01-03T15:47:13.934] [info] [trim-katydid] process state changed to Starting [2024-01-03T15:47:13.936] [info] [trim-katydid] process state changed to Running [2024-01-03T15:47:13.936] [debug] [qemu-system-aarch64] [11761] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:90:87:9c -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/trim-katydid/cloud-init-config.iso [2024-01-03T15:47:13.936] [info] [trim-katydid] process started [2024-01-03T15:47:13.936] [debug] [trim-katydid] Waiting for SSH to be up [2024-01-03T15:47:14.260] [debug] [trim-katydid] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2024-01-03T15:47:14.306] [debug] [trim-katydid] QMP: {"return": {}}

[2024-01-03T15:47:27.426] [debug] [trim-katydid] QMP: {"timestamp": {"seconds": 1704296847, "microseconds": 426481}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}`

IknowJoseph commented 9 months ago

I've got my install fixed now, although I had to delete & purge any installed VM. After reading:

https://github.com/canonical/multipass/issues/2387 https://github.com/canonical/multipass/issues/2853 https://github.com/canonical/multipass/issues/3003

I saw I had the same issue with an invalid /var/db/dhcpd_leases file - the file had multiple entries with the same name. I had to stop, delete & purge any running VM, delete the dhcpd_leases file and reboot. Everything worked fine on the freshly started host.

I guess this explains why the issue persisted across multiple multipass versions and installs - it was an issue with MacOS. I think I had a multipass crash that left the system in an unstable state.

Perhaps multipass should check that the dhcpd_leases list is sane? This should be straightforward to test by editing the file to include duplicate entries.

townsend2010 commented 9 months ago

Hi @IknowJoseph,

I'm glad you got it working!

Perhaps multipass should check that the dhcpd_leases list is sane?

Multipass doesn't manage this file, so I would be very reluctant to edit the file, but perhaps we could report that there are duplicate entries and let the user try to fix it.

dimaqq commented 5 months ago

Thank you @IknowJoseph, I was stuck too and the total of {sudo kill multipass*, whack dhcp leases, reboot} did the trick. If I get stuck again, I may attempt differential diagnosis.