Open shefmarkh opened 2 years ago
Update:
I failed to spot the DNS resolution is going wrong. Every website tried has an IP starting 198.18.2.* rather than the actual correct IP that the macbook would see. So it's something to do with the DNS resolution inside the VM
Mark
Hello @shefmarkh,
Sorry you are having this issue. Unless you have overridden DNS resolution in the instances via cloud-init
or manually after the instance launches, the DNS resolution is provide by macOS itself. Do you have a special network setup such as a VPN or proxy on your Mac host?
Hello,
I do have a VPN, but the issue persists whether it is enabled or disabled. Uninstalling it also did not help. I use the VPN FortiClient.
Is there any possibility it does something irreversible to something that just uninstalling does not fix?
I do have an older Intel Mac with the same VPN software installed and there Multipass works just fine. The current issues are seen on a 1 month old M1 MacBook Air.
I also have Falcon Crowdstrike installed. I tried to rule this out by also uninstalling it and saw the issues persist (though again perhaps there is some possibility there is some residual settings left in Mac OS files which mess things up?)
I don't have any web proxy running on the Mac.
Thanks,
Mark
Hi @townsend2010
One other clue is that I can access the internet from inside docker containers just fine.
My (possibly wrong) understanding is that both docker and Multipass communicate with something called qemu on the Mac and its qemu (or something local on the Mac, which is being used by qemu) that is resolving dns for containers and vm? If so is there some clue in the fact it works for docker and not Multipass? DNS resolution also fails from inside VM started via UTM.
I can see docker is running:
/Applications/Docker.app/Contents/MacOS/qemu-system-aarch64
whilst Multipass uses:
/Library/Application Support/com.canonical.multipass/bin/qemu-system-aarch64
Thanks,
Mark
Hi @shefmarkh,
Would it be possible to post the full qemu-system-aarch64
Docker command with options, etc? I suspect they are using user networking which is not affected by random macOS networking issue. UTM and Multipass use qemu
with the vmnet
API which is a much more robust solution when Apple doesn't mess things up in the firewall.
We have used Qemu user networking before and found that it is not a good long term solution.
Hello,
Here is the full docker command:
/Applications/Docker.app/Contents/MacOS/qemu-system-aarch64 -accel hvf -cpu host -machine virt,highmem=off -m 4096 -smp 4 -kernel /Applications/Docker.app/Contents/Resources/linuxkit/kernel -append page_poison=1 vsyscall=emulate panic=1 nospec_store_bypass_disable noibrs noibpb no_stf_barrier mitigations=off linuxkit.unified_cgroup_hierarchy=1 vpnkit.connect=tcp+bootstrap+client://192.168.65.2:49643/b1ae8e45ca122b763153afc979b360a3d79aa96f840770031d4c899dd4fbdd8c vpnkit.disable=osxfs-data console=ttyAMA0 -initrd /Applications/Docker.app/Contents/Resources/linuxkit/initrd.img -serial pipe:/var/folders/_d/54hnhxg94n19_r4qzwyq3c3m0000gp/T/qemu-console3839880179/fifo -drive if=none,file=/Users/markhodgkinson/Library/Containers/com.docker.docker/Data/vms/0/data/Docker.raw,format=raw,id=hd0 -device virtio-blk-pci,drive=hd0,serial=dummyserial -netdev socket,id=net1,fd=3 -device virtio-net-device,netdev=net1,mac=02:50:00:00:00:01 -vga none -nographic -monitor none
Cheers,
Mark
Hey @shefmarkh,
Ah, I see, they have integrated their vpnkit solution to use with qemu. So yeah, it's a more robust user level networking solution than say qemu's user networking, but it is a standalone networking solution that doesn't rely on Apple's vmnet
or any of their firewall shenanigans.
I will say the issue you are observing is specific to how the networking is set up on your system and interferes with the vmnet
stuff since both UTM and Multipass have the same issue.
Thanks @townsend2010
I tried installing Parallels and this seems to work fine the networking (looks like it also uses vpnkit), so right now that looks like the best option for running linux VM on my m1 Mac.
Would Multipass consider adding an option to use vpnkit in the future to avoid the issues with vmnet?
Cheers,
Mark
Hi @shefmarkh!
We have an open request at https://github.com/canonical/multipass/issues/1614, but that is kind of specific to Hyperkit and we are going to be deprecating Hyperkit support soon since it's not really being maintained anymore by Moby (the Docker folks). We'd have to see how to integrate vpnkit with qemu since I don't think that is openly available.
It's a shame that there are so many issues with Apple's own vmnet
by their own doing...
Thanks for the help @townsend2010.
I also tried Lima, which uses vpnkit (I think - that would explain why it works and I can see a vpnkit process in "ps aux" when Lima is running), and that worked nicely for me.
I have a colleague with an M1 Mac who is able to use Multipass just fine. Next time I see him at a meeting in October, I will try sitting down with him and maybe we can spot what could be different in the setups on our Macs.
Cheers,
Mark
I am having the exact issue, on M1 Mac
ubuntu@primary:~$ ping google.com
PING google.com (198.18.2.5) 56(84) bytes of data.
^C
--- google.com ping statistics ---
77 packets transmitted, 0 received, 100% packet loss, time 80445ms
ubuntu@primary:~$ ping google.ie
PING google.ie (198.18.2.7) 56(84) bytes of data.
^C
--- google.ie ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2074ms
ubuntu@primary:~$ ping yahoo.com
PING yahoo.com (198.18.2.12) 56(84) bytes of data.
^C
--- yahoo.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2130ms
I have the same problem monterey macOS(intel) multipass 1.10.1+mac multipassd 1.10.1+mac
After playing around, I end up with the following solution.
Edit /etc/pf.conf
as suggested by https://github.com/canonical/multipass/issues/495#issuecomment-448461250, but instead, I use the following line
nat on en0 from bridge100:network to any -> (en0)
However, notice that after every system reboot, I will need to re-run sudo pfctl -f /etc/pf.conf
, and multipass exec primary -- sudo systemd-resolve --flush-cache
. I created a shortcut and just run once after every boot, then it works perfectly.
I have limited knowledge on networking, please correct me if my solution is improper.
I have been having the same issue but on an Intel mac...
Problem: DNS resolution returns sequential "made-up" addresses in the 198.18.2.* range.
The nat suggestion seems to work!
A few more details/tips...
To find the name of the interface, use ifconfig
and look for ones with status:active
. (In my case, because of multiple external port replicators, etc. I am up to en7
.)
When editing the /etc/pf.conf
file, the nat
line must occur directly after the nat-anchor
line.
...
nat-anchor "com.apple/*"
nat on en7 from bridge100:network to any -> (en7)
...
pf.conf
are a bunch of scary words about flushing the ruleset. I don't know if those are actually a concern.$ sudo pfctl -f /etc/pf.conf
pfctl: Use of -f option, could result in flushing of rules
present in the main ruleset added by the system at startup.
See /etc/pf.conf for further details.
No ALTQ support in kernel
ALTQ related functions disabled
sudo systemd-resolve --flush-cache
is needed inside the vm to to cause re-lookup of previously-cached (bad) names.I did some additional testing and digging... in my case I believe the issue is caused by Fortinet ~EDR (endpoint detection and response)~ software intercepting the DNS requests.
Here's how I figured that out...
From inside the vm, even if I specify an external server, I get bogus DNS answers:
ubuntu@primary:~$ dig +short www.google.com
198.18.2.7
ubuntu@primary:~$ dig +short www.microsoft.com
198.18.2.8
ubuntu@primary:~$ dig @1.1.1.1 +short www.google.com
198.18.2.7
ubuntu@primary:~$ dig @1.1.1.1 +short www.microsoft.com
198.18.2.8
On the MacOS host, nothing weird seems to be listening on port 53:
$ sudo lsof -iTCP:53 -iUDP:53 -n -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mDNSRespo 255 _mdnsresponder 15u IPv4 0x6f24b4ee75698a55 0t0 UDP *:53
mDNSRespo 255 _mdnsresponder 16u IPv6 0x6f24b4ee75698d65 0t0 UDP *:53
mDNSRespo 255 _mdnsresponder 17u IPv4 0x6f24b4f80dcf1bd5 0t0 TCP *:53 (LISTEN)
mDNSRespo 255 _mdnsresponder 18u IPv6 0x6f24b4e9a8733075 0t0 TCP *:53 (LISTEN)
But there is an odd packet filter / nat rule in place!
$ sudo pfctl -s all
No ALTQ support in kernel
ALTQ related functions disabled
TRANSLATION RULES:
rdr pass inet proto udp from any to any port = 53 -> 127.0.0.1 port 53535
rdr pass log inet proto tcp from any to <dohhosts> -> 127.0.0.1 port 53535
rdr pass log inet proto tcp from any to <ztnahosts> -> 127.0.0.1 port 49252
FILTER RULES:
STATES:
...
Notice: port = 53 -> 127.0.0.1 port 53535
So what is listening on port 53535?
$ sudo lsof -iUDP:53535 -n -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ztnafw 112 root 9u IPv4 0x6f24b4ee74cf1695 0t0 UDP 127.0.0.1:53535
A google search indicates ztnafw
is a Fortinet binary. Also, there is a LaunchDaemon entry for it:
$ ls /Library/LaunchDaemons/ | grep -i forti
com.fortiedr.collectord.plist
com.fortinet.forticlient.config.plist
com.fortinet.forticlient.macos.PrivilegedHelper.plist
com.fortinet.forticlient.servctl2.plist
com.fortinet.forticlient.vpn.plist
com.fortinet.forticlient.ztnafw.plist
Let's issue DNS queries to that binary (+notcp
seems to be required here since this binary only responds to udp requests):
$ dig @127.0.0.1 -p 53535 +notcp +short www.microsoft.com
198.18.2.8
So from the mac itself I can reproduce the weird DNS behavior when I query that same local listener.
Now as far as adding a nat rule to solve the issue, I still have the question of "does the problem go away because we flush the offending pf rule?" or "does the problem go away because the new nat rule works around the problem?".
I was able to make the problem go away temporarily by simply flushing the rules and making no other changes, because flushing the rules drops the ztnafw
nat rule:
$sudo pfctl -f /etc/pf.conf
After reloading the default rules (which also flushes any other rules present), from inside the vm we can get good DNS resolution:
ubuntu@primary:~$ sudo systemd-resolve --flush-cache
ubuntu@primary:~$ dig +short www.microsoft.com
www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
e13678.dscb.akamaiedge.net.
184.25.165.167
The comments in pf.conf
describe this:
# This file contains the main ruleset, which gets automatically loaded
# at startup. PF will not be automatically enabled, however. Instead,
# each component which utilizes PF is responsible for enabling and disabling
# PF via -E and -X as documented in pfctl(8). That will ensure that PF
# is disabled only when the last enable reference is released.
#
# Care must be taken to ensure that the main ruleset does not get flushed,
# as the nested anchors rely on the anchor point defined here. In addition,
# to the anchors loaded by this file, some system services would dynamically
# insert anchors into the main ruleset. These anchors will be added only when
# the system service is used and would removed on termination of the service.
So finally, can we fix this via a nat rule that is permanent across reboots?
Possibly:
(Use /Library/LaunchDaemons
instead of /System/Library/LaunchDaemons
which is reserved for apple system.)
https://superuser.com/a/1334488
Can multipass work within the pf framework to add a rule that improves DNS resolution reliability? Could multipassd
add such a rule when it starts?
Thanks @runhardr. now I'm can sleep well. my Laravel Valet setup can access dnsmasq to 127.0.0.1 .
I did some additional testing and digging... in my case I believe the issue is caused by Fortinet ~EDR (endpoint detection and response)~ software intercepting the DNS requests.
Here's how I figured that out...
From inside the vm, even if I specify an external server, I get bogus DNS answers:
ubuntu@primary:~$ dig +short www.google.com 198.18.2.7 ubuntu@primary:~$ dig +short www.microsoft.com 198.18.2.8 ubuntu@primary:~$ dig @1.1.1.1 +short www.google.com 198.18.2.7 ubuntu@primary:~$ dig @1.1.1.1 +short www.microsoft.com 198.18.2.8
On the MacOS host, nothing weird seems to be listening on port 53:
$ sudo lsof -iTCP:53 -iUDP:53 -n -P COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mDNSRespo 255 _mdnsresponder 15u IPv4 0x6f24b4ee75698a55 0t0 UDP *:53 mDNSRespo 255 _mdnsresponder 16u IPv6 0x6f24b4ee75698d65 0t0 UDP *:53 mDNSRespo 255 _mdnsresponder 17u IPv4 0x6f24b4f80dcf1bd5 0t0 TCP *:53 (LISTEN) mDNSRespo 255 _mdnsresponder 18u IPv6 0x6f24b4e9a8733075 0t0 TCP *:53 (LISTEN)
But there is an odd packet filter / nat rule in place!
$ sudo pfctl -s all No ALTQ support in kernel ALTQ related functions disabled TRANSLATION RULES: rdr pass inet proto udp from any to any port = 53 -> 127.0.0.1 port 53535 rdr pass log inet proto tcp from any to <dohhosts> -> 127.0.0.1 port 53535 rdr pass log inet proto tcp from any to <ztnahosts> -> 127.0.0.1 port 49252 FILTER RULES: STATES: ...
Notice: port = 53 -> 127.0.0.1 port 53535
So what is listening on port 53535?
$ sudo lsof -iUDP:53535 -n -P COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ztnafw 112 root 9u IPv4 0x6f24b4ee74cf1695 0t0 UDP 127.0.0.1:53535
A google search indicates
ztnafw
is a Fortinet binary. Also, there is a LaunchDaemon entry for it:$ ls /Library/LaunchDaemons/ | grep -i forti com.fortiedr.collectord.plist com.fortinet.forticlient.config.plist com.fortinet.forticlient.macos.PrivilegedHelper.plist com.fortinet.forticlient.servctl2.plist com.fortinet.forticlient.vpn.plist com.fortinet.forticlient.ztnafw.plist
Let's issue DNS queries to that binary (
+notcp
seems to be required here since this binary only responds to udp requests):$ dig @127.0.0.1 -p 53535 +notcp +short www.microsoft.com 198.18.2.8
So from the mac itself I can reproduce the weird DNS behavior when I query that same local listener.
Now as far as adding a nat rule to solve the issue, I still have the question of "does the problem go away because we flush the offending pf rule?" or "does the problem go away because the new nat rule works around the problem?".
I was able to make the problem go away temporarily by simply flushing the rules and making no other changes, because flushing the rules drops the
ztnafw
nat rule:$sudo pfctl -f /etc/pf.conf
After reloading the default rules (which also flushes any other rules present), from inside the vm we can get good DNS resolution:
ubuntu@primary:~$ sudo systemd-resolve --flush-cache ubuntu@primary:~$ dig +short www.microsoft.com www.microsoft.com-c-3.edgekey.net. www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net. e13678.dscb.akamaiedge.net. 184.25.165.167
The comments in
pf.conf
describe this:# This file contains the main ruleset, which gets automatically loaded # at startup. PF will not be automatically enabled, however. Instead, # each component which utilizes PF is responsible for enabling and disabling # PF via -E and -X as documented in pfctl(8). That will ensure that PF # is disabled only when the last enable reference is released. # # Care must be taken to ensure that the main ruleset does not get flushed, # as the nested anchors rely on the anchor point defined here. In addition, # to the anchors loaded by this file, some system services would dynamically # insert anchors into the main ruleset. These anchors will be added only when # the system service is used and would removed on termination of the service.
So finally, can we fix this via a nat rule that is permanent across reboots?
Possibly:
(Use
/Library/LaunchDaemons
instead of/System/Library/LaunchDaemons
which is reserved for apple system.)https://superuser.com/a/1334488
Can multipass work within the pf framework to add a rule that improves DNS resolution reliability? Could
multipassd
add such a rule when it starts?
I ran into the same problem with the following setup:
Forticlient is also installed but it is not being used.
I will have a look at what @runhardr suggested
The issue happens on my mac (ventura). Multipass worked well for a few weeks. It suddenly fails to resolve the DNS today. I have tried uninstall and reinstall.
ubuntu@docker-vm:~$ dig +short www.google.com
198.18.2.11
as @runhardr, I run the forticlient
This problem with forticlient still persists
Based on #888, I hardcoded DNS to 8.8.8.8
, 8.8.4.4
, and my internal DNS. Now able to access Fortinet devices and start multipass VMs as expected. Probably not as feasible for everyone, but a decent workaround for now.
Following this guide resolved issue for me https://serverok.in/systemd-resolved
In case the link die in the future: Create a file:
sudo mkdir /etc/systemd/resolved.conf.d/ sudo nano /etc/systemd/resolved.conf.d/dns_servers.conf
Add my dns in this file:
[Resolve] DNS=8.8.8.8 1.1.1.1
Then restart systemd-resolved
sudo systemctl restart systemd-resolved
Still had this issue with latest multipass, and the last comment from @harssh solved the issue! It also solved the issue with HTTPS calls as well. Thanks!
having the same issue with multipass 1.14.0 on mac m1
Hello,
I launched a Multipass instance with:
multipass launch -c 6 -m 10G -d 50G --name markTest
Then inside it e.g a git clone fails or apt update fails:
git clone https://github.com/cvmfs/cvmfs.git Cloning into 'cvmfs'... fatal: unable to access 'https://github.com/cvmfs/cvmfs.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
OR
sudo apt update Err:1 http://ports.ubuntu.com/ubuntu-ports focal InRelease Connection failed [IP: 198.18.2.6 80] Err:2 http://ports.ubuntu.com/ubuntu-ports focal-updates InRelease Connection failed [IP: 198.18.2.6 80] Err:3 http://ports.ubuntu.com/ubuntu-ports focal-backports InRelease Connection failed [IP: 198.18.2.6 80] Err:4 http://ports.ubuntu.com/ubuntu-ports focal-security InRelease Connection failed [IP: 198.18.2.6 80] Reading package lists... Done
Building dependency tree
Reading state information... Done All packages are up to date. W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal/InRelease Connection failed [IP: 198.18.2.6 80] W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal-updates/InRelease Connection failed [IP: 198.18.2.6 80] W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal-backports/InRelease Connection failed [IP: 198.18.2.6 80] W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/focal-security/InRelease Connection failed [IP: 198.18.2.6 80] W: Some index files failed to download. They have been ignored, or old ones used instead.
I worked through:
https://multipass.run/docs/troubleshooting-networking-on-macos#heading--dns-problems
and seem to get all the expected output, so am at a loss as to how further diagnose the issue. Do you have any suggestions?
Here is the output I get from the suggested diagnostic tests inside the instance (accessed via "multipass shell markTest")
ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=19.5 ms 64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=12.1 ms 64 bytes from 1.1.1.1: icmp_seq=3 ttl=58 time=12.4 ms --- 1.1.1.1 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9447ms rtt min/avg/max/mdev = 12.128/15.356/19.502/3.169 ms
dig google.ie
; <<>> DiG 9.16.1-Ubuntu <<>> google.ie ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20802 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;google.ie. IN A
;; ANSWER SECTION: google.ie. 3600 IN A 198.18.2.8
;; Query time: 12 msec ;; SERVER: 127.0.0.53#53(127.0.0.53) ;; WHEN: Sun Jul 24 21:48:56 BST 2022 ;; MSG SIZE rcvd: 54
dig @1.1.1.1 google.ie
; <<>> DiG 9.16.1-Ubuntu <<>> @1.1.1.1 google.ie ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53071 ;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available
;; QUESTION SECTION: ;google.ie. IN A
;; ANSWER SECTION: google.ie. 3600 IN A 198.18.2.8
;; Query time: 4 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Sun Jul 24 21:49:14 BST 2022 ;; MSG SIZE rcvd: 52
Then locally on my Mac I checked whilst the instance is running in another terminal:
sudo lsof -iTCP:53 -iUDP:53 -n -P Password:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mDNSRespo 453 _mdnsresponder 57u IPv4 0x893411bddad0c2c3 0t0 UDP :53 mDNSRespo 453 _mdnsresponder 58u IPv6 0x893411bddad0c5d3 0t0 UDP :53 mDNSRespo 453 _mdnsresponder 60u IPv4 0x893411cc3e3df8b3 0t0 TCP :53 (LISTEN) mDNSRespo 453 _mdnsresponder 61u IPv6 0x893411cc3e43a203 0t0 TCP :53 (LISTEN)
Inside the instance two of the suggested files to check/modify have these contents:
more /etc/resolv.conf
nameserver 127.0.0.53
nameserver 1.1.1.1
options edns0 trust-ad search broadband
and
more /etc/systemd/resolved.conf
[Resolve] DNS=127.0.0.53 FallbackDNS=8.8.8.8
Domains=
LLMNR=no
MulticastDNS=no
DNSSEC=no
DNSOverTLS=no
Cache=no-negative
DNSStubListener=yes
ReadEtcHosts=yes
Thanks,
Mark