canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.95k stars 651 forks source link

Multipass instances on m1 Mac cannot access internet #2680

Open shefmarkh opened 2 years ago

shefmarkh commented 2 years ago

Hello,

I launched a Multipass instance with:

multipass launch -c 6 -m 10G -d 50G --name markTest

Then inside it e.g a git clone fails or apt update fails:

git clone https://github.com/cvmfs/cvmfs.git Cloning into 'cvmfs'... fatal: unable to access 'https://github.com/cvmfs/cvmfs.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.

OR

sudo apt update Err:1 http://ports.ubuntu.com/ubuntu-ports focal InRelease Connection failed [IP: 198.18.2.6 80] Err:2 http://ports.ubuntu.com/ubuntu-ports focal-updates InRelease Connection failed [IP: 198.18.2.6 80] Err:3 http://ports.ubuntu.com/ubuntu-ports focal-backports InRelease Connection failed [IP: 198.18.2.6 80] Err:4 http://ports.ubuntu.com/ubuntu-ports focal-security InRelease Connection failed [IP: 198.18.2.6 80] Reading package lists... Done
Building dependency tree
Reading state information... Done All packages are up to date. W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal/InRelease Connection failed [IP: 198.18.2.6 80] W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal-updates/InRelease Connection failed [IP: 198.18.2.6 80] W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal-backports/InRelease Connection failed [IP: 198.18.2.6 80] W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal-security/InRelease Connection failed [IP: 198.18.2.6 80] W: Some index files failed to download. They have been ignored, or old ones used instead.

I worked through:

https://multipass.run/docs/troubleshooting-networking-on-macos#heading--dns-problems

and seem to get all the expected output, so am at a loss as to how further diagnose the issue. Do you have any suggestions?

Here is the output I get from the suggested diagnostic tests inside the instance (accessed via "multipass shell markTest")

ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=19.5 ms 64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=12.1 ms 64 bytes from 1.1.1.1: icmp_seq=3 ttl=58 time=12.4 ms --- 1.1.1.1 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9447ms rtt min/avg/max/mdev = 12.128/15.356/19.502/3.169 ms

dig google.ie

; <<>> DiG 9.16.1-Ubuntu <<>> google.ie ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20802 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;google.ie. IN A

;; ANSWER SECTION: google.ie. 3600 IN A 198.18.2.8

;; Query time: 12 msec ;; SERVER: 127.0.0.53#53(127.0.0.53) ;; WHEN: Sun Jul 24 21:48:56 BST 2022 ;; MSG SIZE rcvd: 54

dig @1.1.1.1 google.ie

; <<>> DiG 9.16.1-Ubuntu <<>> @1.1.1.1 google.ie ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53071 ;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available

;; QUESTION SECTION: ;google.ie. IN A

;; ANSWER SECTION: google.ie. 3600 IN A 198.18.2.8

;; Query time: 4 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Sun Jul 24 21:49:14 BST 2022 ;; MSG SIZE rcvd: 52

Then locally on my Mac I checked whilst the instance is running in another terminal:

sudo lsof -iTCP:53 -iUDP:53 -n -P Password:

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mDNSRespo 453 _mdnsresponder 57u IPv4 0x893411bddad0c2c3 0t0 UDP :53 mDNSRespo 453 _mdnsresponder 58u IPv6 0x893411bddad0c5d3 0t0 UDP :53 mDNSRespo 453 _mdnsresponder 60u IPv4 0x893411cc3e3df8b3 0t0 TCP :53 (LISTEN) mDNSRespo 453 _mdnsresponder 61u IPv6 0x893411cc3e43a203 0t0 TCP :53 (LISTEN)

Inside the instance two of the suggested files to check/modify have these contents:

more /etc/resolv.conf

nameserver 127.0.0.53

nameserver 1.1.1.1

options edns0 trust-ad search broadband

and

more /etc/systemd/resolved.conf

[Resolve] DNS=127.0.0.53 FallbackDNS=8.8.8.8

Domains=

LLMNR=no

MulticastDNS=no

DNSSEC=no

DNSOverTLS=no

Cache=no-negative

DNSStubListener=yes

ReadEtcHosts=yes

Thanks,

Mark

shefmarkh commented 2 years ago

Update:

I failed to spot the DNS resolution is going wrong. Every website tried has an IP starting 198.18.2.* rather than the actual correct IP that the macbook would see. So it's something to do with the DNS resolution inside the VM

Mark

townsend2010 commented 2 years ago

Hello @shefmarkh,

Sorry you are having this issue. Unless you have overridden DNS resolution in the instances via cloud-init or manually after the instance launches, the DNS resolution is provide by macOS itself. Do you have a special network setup such as a VPN or proxy on your Mac host?

shefmarkh commented 2 years ago

Hello,

I do have a VPN, but the issue persists whether it is enabled or disabled. Uninstalling it also did not help. I use the VPN FortiClient.

Is there any possibility it does something irreversible to something that just uninstalling does not fix?

I do have an older Intel Mac with the same VPN software installed and there Multipass works just fine. The current issues are seen on a 1 month old M1 MacBook Air.

I also have Falcon Crowdstrike installed. I tried to rule this out by also uninstalling it and saw the issues persist (though again perhaps there is some possibility there is some residual settings left in Mac OS files which mess things up?)

I don't have any web proxy running on the Mac.

Thanks,

Mark

shefmarkh commented 2 years ago

Hi @townsend2010

One other clue is that I can access the internet from inside docker containers just fine.

My (possibly wrong) understanding is that both docker and Multipass communicate with something called qemu on the Mac and its qemu (or something local on the Mac, which is being used by qemu) that is resolving dns for containers and vm? If so is there some clue in the fact it works for docker and not Multipass? DNS resolution also fails from inside VM started via UTM.

I can see docker is running:

/Applications/Docker.app/Contents/MacOS/qemu-system-aarch64

whilst Multipass uses:

/Library/Application Support/com.canonical.multipass/bin/qemu-system-aarch64

Thanks,

Mark

townsend2010 commented 2 years ago

Hi @shefmarkh,

Would it be possible to post the full qemu-system-aarch64 Docker command with options, etc? I suspect they are using user networking which is not affected by random macOS networking issue. UTM and Multipass use qemu with the vmnet API which is a much more robust solution when Apple doesn't mess things up in the firewall.

We have used Qemu user networking before and found that it is not a good long term solution.

shefmarkh commented 2 years ago

Hello,

Here is the full docker command:

/Applications/Docker.app/Contents/MacOS/qemu-system-aarch64 -accel hvf -cpu host -machine virt,highmem=off -m 4096 -smp 4 -kernel /Applications/Docker.app/Contents/Resources/linuxkit/kernel -append page_poison=1 vsyscall=emulate panic=1 nospec_store_bypass_disable noibrs noibpb no_stf_barrier mitigations=off linuxkit.unified_cgroup_hierarchy=1 vpnkit.connect=tcp+bootstrap+client://192.168.65.2:49643/b1ae8e45ca122b763153afc979b360a3d79aa96f840770031d4c899dd4fbdd8c vpnkit.disable=osxfs-data console=ttyAMA0 -initrd /Applications/Docker.app/Contents/Resources/linuxkit/initrd.img -serial pipe:/var/folders/_d/54hnhxg94n19_r4qzwyq3c3m0000gp/T/qemu-console3839880179/fifo -drive if=none,file=/Users/markhodgkinson/Library/Containers/com.docker.docker/Data/vms/0/data/Docker.raw,format=raw,id=hd0 -device virtio-blk-pci,drive=hd0,serial=dummyserial -netdev socket,id=net1,fd=3 -device virtio-net-device,netdev=net1,mac=02:50:00:00:00:01 -vga none -nographic -monitor none

Cheers,

Mark

townsend2010 commented 2 years ago

Hey @shefmarkh,

Ah, I see, they have integrated their vpnkit solution to use with qemu. So yeah, it's a more robust user level networking solution than say qemu's user networking, but it is a standalone networking solution that doesn't rely on Apple's vmnet or any of their firewall shenanigans.

I will say the issue you are observing is specific to how the networking is set up on your system and interferes with the vmnet stuff since both UTM and Multipass have the same issue.

shefmarkh commented 2 years ago

Thanks @townsend2010

I tried installing Parallels and this seems to work fine the networking (looks like it also uses vpnkit), so right now that looks like the best option for running linux VM on my m1 Mac.

Would Multipass consider adding an option to use vpnkit in the future to avoid the issues with vmnet?

Cheers,

Mark

townsend2010 commented 2 years ago

Hi @shefmarkh!

We have an open request at https://github.com/canonical/multipass/issues/1614, but that is kind of specific to Hyperkit and we are going to be deprecating Hyperkit support soon since it's not really being maintained anymore by Moby (the Docker folks). We'd have to see how to integrate vpnkit with qemu since I don't think that is openly available.

It's a shame that there are so many issues with Apple's own vmnet by their own doing...

shefmarkh commented 2 years ago

Thanks for the help @townsend2010.

I also tried Lima, which uses vpnkit (I think - that would explain why it works and I can see a vpnkit process in "ps aux" when Lima is running), and that worked nicely for me.

I have a colleague with an M1 Mac who is able to use Multipass just fine. Next time I see him at a meeting in October, I will try sitting down with him and maybe we can spot what could be different in the setups on our Macs.

Cheers,

Mark

tchunwei commented 2 years ago

I am having the exact issue, on M1 Mac

ubuntu@primary:~$ ping google.com
PING google.com (198.18.2.5) 56(84) bytes of data.
^C
--- google.com ping statistics ---
77 packets transmitted, 0 received, 100% packet loss, time 80445ms

ubuntu@primary:~$ ping google.ie
PING google.ie (198.18.2.7) 56(84) bytes of data.
^C
--- google.ie ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2074ms

ubuntu@primary:~$ ping yahoo.com
PING yahoo.com (198.18.2.12) 56(84) bytes of data.
^C
--- yahoo.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2130ms
eugenesamoilov commented 2 years ago

I have the same problem monterey macOS(intel) multipass 1.10.1+mac multipassd 1.10.1+mac

tchunwei commented 2 years ago

After playing around, I end up with the following solution. Edit /etc/pf.conf as suggested by https://github.com/canonical/multipass/issues/495#issuecomment-448461250, but instead, I use the following line

nat on en0 from bridge100:network to any -> (en0)

However, notice that after every system reboot, I will need to re-run sudo pfctl -f /etc/pf.conf, and multipass exec primary -- sudo systemd-resolve --flush-cache. I created a shortcut and just run once after every boot, then it works perfectly.

I have limited knowledge on networking, please correct me if my solution is improper.

runhardr commented 2 years ago

I have been having the same issue but on an Intel mac...

Problem: DNS resolution returns sequential "made-up" addresses in the 198.18.2.* range.

The nat suggestion seems to work!

A few more details/tips...

  1. To find the name of the interface, use ifconfig and look for ones with status:active. (In my case, because of multiple external port replicators, etc. I am up to en7.)

  2. When editing the /etc/pf.conf file, the nat line must occur directly after the nat-anchor line.

...
nat-anchor "com.apple/*"
nat on en7 from bridge100:network to any -> (en7)
...
  1. Running the pfctl command and reading the comments in pf.conf are a bunch of scary words about flushing the ruleset. I don't know if those are actually a concern.
$ sudo pfctl -f /etc/pf.conf
pfctl: Use of -f option, could result in flushing of rules
present in the main ruleset added by the system at startup.
See /etc/pf.conf for further details.

No ALTQ support in kernel
ALTQ related functions disabled
  1. ~I did not need to issue a~ sudo systemd-resolve --flush-cache is needed inside the vm to to cause re-lookup of previously-cached (bad) names.
runhardr commented 2 years ago

I did some additional testing and digging... in my case I believe the issue is caused by Fortinet ~EDR (endpoint detection and response)~ software intercepting the DNS requests.

Here's how I figured that out...

From inside the vm, even if I specify an external server, I get bogus DNS answers:

ubuntu@primary:~$ dig +short www.google.com
198.18.2.7
ubuntu@primary:~$ dig +short www.microsoft.com
198.18.2.8

ubuntu@primary:~$ dig @1.1.1.1 +short www.google.com
198.18.2.7
ubuntu@primary:~$ dig @1.1.1.1 +short www.microsoft.com
198.18.2.8

On the MacOS host, nothing weird seems to be listening on port 53:

$ sudo lsof -iTCP:53 -iUDP:53 -n -P
COMMAND   PID           USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
mDNSRespo 255 _mdnsresponder   15u  IPv4 0x6f24b4ee75698a55      0t0  UDP *:53
mDNSRespo 255 _mdnsresponder   16u  IPv6 0x6f24b4ee75698d65      0t0  UDP *:53
mDNSRespo 255 _mdnsresponder   17u  IPv4 0x6f24b4f80dcf1bd5      0t0  TCP *:53 (LISTEN)
mDNSRespo 255 _mdnsresponder   18u  IPv6 0x6f24b4e9a8733075      0t0  TCP *:53 (LISTEN)

But there is an odd packet filter / nat rule in place!

$ sudo pfctl -s all
No ALTQ support in kernel
ALTQ related functions disabled
TRANSLATION RULES:
rdr pass inet proto udp from any to any port = 53 -> 127.0.0.1 port 53535
rdr pass log inet proto tcp from any to <dohhosts> -> 127.0.0.1 port 53535
rdr pass log inet proto tcp from any to <ztnahosts> -> 127.0.0.1 port 49252

FILTER RULES:

STATES:
...

Notice: port = 53 -> 127.0.0.1 port 53535

So what is listening on port 53535?

$ sudo lsof -iUDP:53535 -n -P         
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ztnafw  112 root    9u  IPv4 0x6f24b4ee74cf1695      0t0  UDP 127.0.0.1:53535

A google search indicates ztnafw is a Fortinet binary. Also, there is a LaunchDaemon entry for it:

$ ls /Library/LaunchDaemons/ | grep -i forti
com.fortiedr.collectord.plist
com.fortinet.forticlient.config.plist
com.fortinet.forticlient.macos.PrivilegedHelper.plist
com.fortinet.forticlient.servctl2.plist
com.fortinet.forticlient.vpn.plist
com.fortinet.forticlient.ztnafw.plist

Let's issue DNS queries to that binary (+notcp seems to be required here since this binary only responds to udp requests):

$ dig @127.0.0.1 -p 53535 +notcp +short www.microsoft.com
198.18.2.8

So from the mac itself I can reproduce the weird DNS behavior when I query that same local listener.

Now as far as adding a nat rule to solve the issue, I still have the question of "does the problem go away because we flush the offending pf rule?" or "does the problem go away because the new nat rule works around the problem?".

I was able to make the problem go away temporarily by simply flushing the rules and making no other changes, because flushing the rules drops the ztnafw nat rule:

$sudo pfctl -f /etc/pf.conf

After reloading the default rules (which also flushes any other rules present), from inside the vm we can get good DNS resolution:

ubuntu@primary:~$ sudo systemd-resolve --flush-cache

ubuntu@primary:~$ dig +short www.microsoft.com
www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
e13678.dscb.akamaiedge.net.
184.25.165.167

The comments in pf.conf describe this:

# This file contains the main ruleset, which gets automatically loaded
# at startup.  PF will not be automatically enabled, however.  Instead,
# each component which utilizes PF is responsible for enabling and disabling
# PF via -E and -X as documented in pfctl(8).  That will ensure that PF
# is disabled only when the last enable reference is released.
#
# Care must be taken to ensure that the main ruleset does not get flushed,
# as the nested anchors rely on the anchor point defined here. In addition,
# to the anchors loaded by this file, some system services would dynamically 
# insert anchors into the main ruleset. These anchors will be added only when
# the system service is used and would removed on termination of the service.

So finally, can we fix this via a nat rule that is permanent across reboots?

Possibly:

https://iyanmv.medium.com/setting-up-correctly-packet-filter-pf-firewall-on-any-macos-from-sierra-to-big-sur-47e70e062a0e

(Use /Library/LaunchDaemons instead of /System/Library/LaunchDaemons which is reserved for apple system.)

https://superuser.com/a/1334488

Can multipass work within the pf framework to add a rule that improves DNS resolution reliability? Could multipassd add such a rule when it starts?

yunus-floo commented 2 years ago

Thanks @runhardr. now I'm can sleep well. my Laravel Valet setup can access dnsmasq to 127.0.0.1 .

I did some additional testing and digging... in my case I believe the issue is caused by Fortinet ~EDR (endpoint detection and response)~ software intercepting the DNS requests.

Here's how I figured that out...

From inside the vm, even if I specify an external server, I get bogus DNS answers:

ubuntu@primary:~$ dig +short www.google.com
198.18.2.7
ubuntu@primary:~$ dig +short www.microsoft.com
198.18.2.8

ubuntu@primary:~$ dig @1.1.1.1 +short www.google.com
198.18.2.7
ubuntu@primary:~$ dig @1.1.1.1 +short www.microsoft.com
198.18.2.8

On the MacOS host, nothing weird seems to be listening on port 53:

$ sudo lsof -iTCP:53 -iUDP:53 -n -P
COMMAND   PID           USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
mDNSRespo 255 _mdnsresponder   15u  IPv4 0x6f24b4ee75698a55      0t0  UDP *:53
mDNSRespo 255 _mdnsresponder   16u  IPv6 0x6f24b4ee75698d65      0t0  UDP *:53
mDNSRespo 255 _mdnsresponder   17u  IPv4 0x6f24b4f80dcf1bd5      0t0  TCP *:53 (LISTEN)
mDNSRespo 255 _mdnsresponder   18u  IPv6 0x6f24b4e9a8733075      0t0  TCP *:53 (LISTEN)

But there is an odd packet filter / nat rule in place!

$ sudo pfctl -s all
No ALTQ support in kernel
ALTQ related functions disabled
TRANSLATION RULES:
rdr pass inet proto udp from any to any port = 53 -> 127.0.0.1 port 53535
rdr pass log inet proto tcp from any to <dohhosts> -> 127.0.0.1 port 53535
rdr pass log inet proto tcp from any to <ztnahosts> -> 127.0.0.1 port 49252

FILTER RULES:

STATES:
...

Notice: port = 53 -> 127.0.0.1 port 53535

So what is listening on port 53535?

$ sudo lsof -iUDP:53535 -n -P         
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ztnafw  112 root    9u  IPv4 0x6f24b4ee74cf1695      0t0  UDP 127.0.0.1:53535

A google search indicates ztnafw is a Fortinet binary. Also, there is a LaunchDaemon entry for it:

$ ls /Library/LaunchDaemons/ | grep -i forti
com.fortiedr.collectord.plist
com.fortinet.forticlient.config.plist
com.fortinet.forticlient.macos.PrivilegedHelper.plist
com.fortinet.forticlient.servctl2.plist
com.fortinet.forticlient.vpn.plist
com.fortinet.forticlient.ztnafw.plist

Let's issue DNS queries to that binary (+notcp seems to be required here since this binary only responds to udp requests):

$ dig @127.0.0.1 -p 53535 +notcp +short www.microsoft.com
198.18.2.8

So from the mac itself I can reproduce the weird DNS behavior when I query that same local listener.

Now as far as adding a nat rule to solve the issue, I still have the question of "does the problem go away because we flush the offending pf rule?" or "does the problem go away because the new nat rule works around the problem?".

I was able to make the problem go away temporarily by simply flushing the rules and making no other changes, because flushing the rules drops the ztnafw nat rule:

$sudo pfctl -f /etc/pf.conf

After reloading the default rules (which also flushes any other rules present), from inside the vm we can get good DNS resolution:

ubuntu@primary:~$ sudo systemd-resolve --flush-cache

ubuntu@primary:~$ dig +short www.microsoft.com
www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
e13678.dscb.akamaiedge.net.
184.25.165.167

The comments in pf.conf describe this:

# This file contains the main ruleset, which gets automatically loaded
# at startup.  PF will not be automatically enabled, however.  Instead,
# each component which utilizes PF is responsible for enabling and disabling
# PF via -E and -X as documented in pfctl(8).  That will ensure that PF
# is disabled only when the last enable reference is released.
#
# Care must be taken to ensure that the main ruleset does not get flushed,
# as the nested anchors rely on the anchor point defined here. In addition,
# to the anchors loaded by this file, some system services would dynamically 
# insert anchors into the main ruleset. These anchors will be added only when
# the system service is used and would removed on termination of the service.

So finally, can we fix this via a nat rule that is permanent across reboots?

Possibly:

https://iyanmv.medium.com/setting-up-correctly-packet-filter-pf-firewall-on-any-macos-from-sierra-to-big-sur-47e70e062a0e

(Use /Library/LaunchDaemons instead of /System/Library/LaunchDaemons which is reserved for apple system.)

https://superuser.com/a/1334488

Can multipass work within the pf framework to add a rule that improves DNS resolution reliability? Could multipassd add such a rule when it starts?

paraita commented 1 year ago

I ran into the same problem with the following setup:

Forticlient is also installed but it is not being used.

I will have a look at what @runhardr suggested

shu-ming commented 1 year ago

The issue happens on my mac (ventura). Multipass worked well for a few weeks. It suddenly fails to resolve the DNS today. I have tried uninstall and reinstall. ubuntu@docker-vm:~$ dig +short www.google.com 198.18.2.11

as @runhardr, I run the forticlient

victor-develop commented 1 year ago

This problem with forticlient still persists

pythoninthegrass commented 1 year ago

Based on #888, I hardcoded DNS to 8.8.8.8, 8.8.4.4, and my internal DNS. Now able to access Fortinet devices and start multipass VMs as expected. Probably not as feasible for everyone, but a decent workaround for now.

harssh commented 1 year ago

Following this guide resolved issue for me https://serverok.in/systemd-resolved

In case the link die in the future: Create a file:

sudo mkdir /etc/systemd/resolved.conf.d/ sudo nano /etc/systemd/resolved.conf.d/dns_servers.conf

Add my dns in this file:

[Resolve] DNS=8.8.8.8 1.1.1.1

Then restart systemd-resolved

sudo systemctl restart systemd-resolved

buger commented 5 months ago

Still had this issue with latest multipass, and the last comment from @harssh solved the issue! It also solved the issue with HTTPS calls as well. Thanks!

naijim commented 2 months ago

having the same issue with multipass 1.14.0 on mac m1