AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
24.56k stars 1.78k forks source link

BSD: AGH responds with a wrong src addr when it listens to 0.0.0.0 #3015

Open pmhausen opened 3 years ago

pmhausen commented 3 years ago

Have a question or an idea? Please search it on our forum to make sure it was not yet asked. If you cannot find what you had in mind, please submit it here.

Prerequisites

Please answer the following questions for yourself before submitting an issue. YOU MAY DELETE THE PREREQUISITES SECTION.

Issue Details

Expected Behavior

When dealing with any kind of alias IP addresses, be it CARP or static alias entries, AdGuard Home should send replies from the address that was queried.

Actual Behavior

The replies are always sent from the same IP address to the client system, regardless which one was queried.

Screenshots

Server:

lagg0_vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 00:0d:b9:57:27:90
    inet6 fe80::20d:b9ff:fe57:2790%lagg0_vlan2 prefixlen 64 scopeid 0xb
    inet6 2003:a:d59:3880:20d:b9ff:fe57:2790 prefixlen 64
    inet 192.168.2.4 netmask 0xffffff00 broadcast 192.168.2.255
    inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
[...]

Client - working:

$ dig www.heise.de @192.168.2.4
[...]
;; ANSWER SECTION:
www.heise.de.       85855   IN  A   193.99.144.85

Client - failing:

$ dig www.heise.de @192.168.2.1
;; reply from unexpected source: 192.168.2.4#53, expected 192.168.2.1#53
;; reply from unexpected source: 192.168.2.4#53, expected 192.168.2.1#53
^C$

Additional Information

mimugmail commented 3 years ago

Additional context: The plugin uses the precompiled binary from your release tree. I updated to 0.106.1 yesterday so you can try the latest version but I'd guess it's something related to go itself. There are similar issues with Wireguard in it's go implementation and alias IPs

pmhausen commented 3 years ago

I doubt it's golang, but rather a sloppy (sorry) implementation. With UDP sockets you need to either actively keep track of the queried address or bind to each address individually. The latter is what BIND does. The downside to this is that BIND won't serve addresses that are created after it started.

If you bind to INADDR_ANY (0.0.0.0) the observed behaviour with AdGuard is exactly what is supposed to happen. Each sent packet uses the "primary" address on that particular interface, whatever the underlying OS defines as primary.

I only have an IPv6 example for BIND listening on "any", but here it is for reference.

Config:

listen-on-v6    { any; };

Result:

% netstat -na | grep 'udp6.*53'
udp6       0      0 fe80::1%lo0.53         *.*                    
udp6       0      0 ::1.53                 *.*                    
udp6       0      0 2a00:b580:8000:8.53    *.*                    
udp6       0      0 fe80::20c:29ff:f.53    *.*                    

So on startup it scans for all interfaces and then binds to each address individually.

Compare with AdGuard configured to listen on all interfaces:

root@opnsense:~ # netstat -na | grep 'udp.*53'
udp46      0      0 *.53                   *.*                    

* is equivalent to INADDR_ANY.

pmhausen commented 3 years ago

Compare with a TCP socket where you don't have this particular problem. You can listen() on INADDR_ANY, but the moment the server does accept(), a quadruple of source port, source address, destination port, destination address is assigned to the socket and the packets are always sent from the correct address.

UDP:

$ dig @192.168.2.1 www.heise.de
;; reply from unexpected source: 192.168.2.4#53, expected 192.168.2.1#53
[...]
12:46:38.248923 IP 192.168.2.128.62575 > 192.168.2.1.53: 32821+ [1au] A? www.heise.de. (41)
12:46:38.249695 IP 192.168.2.4.53 > 192.168.2.128.62575: 32821 1/0/1 A 193.99.144.85 (57)

TCP:

$ dig +tcp @192.168.2.1 www.heise.de
[...]
;; ANSWER SECTION:
www.heise.de.       44750   IN  A   193.99.144.85
12:47:08.366525 IP 192.168.2.128.56785 > 192.168.2.1.53: Flags [S], seq 1313613588, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 683292325 ecr 0,sackOK,eol], length 0
12:47:08.366747 IP 192.168.2.1.53 > 192.168.2.128.56785: Flags [S.], seq 580421076, ack 1313613589, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 194459837 ecr 683292325], length 0
12:47:08.366808 IP 192.168.2.128.56785 > 192.168.2.1.53: Flags [.], ack 1, win 2058, options [nop,nop,TS val 683292325 ecr 194459837], length 0
12:47:08.366872 IP 192.168.2.128.56785 > 192.168.2.1.53: Flags [P.], seq 1:44, ack 1, win 2058, options [nop,nop,TS val 683292325 ecr 194459837], length 43 48618+ [1au] A? www.heise.de. (41)
12:47:08.367037 IP 192.168.2.1.53 > 192.168.2.128.56785: Flags [.], ack 44, win 512, options [nop,nop,TS val 194459837 ecr 683292325], length 0
12:47:08.367351 IP 192.168.2.1.53 > 192.168.2.128.56785: Flags [P.], seq 1:60, ack 44, win 513, options [nop,nop,TS val 194459838 ecr 683292325], length 59 48618 1/0/1 A 193.99.144.85 (57)
[...]
ameshkov commented 3 years ago

The replies are always sent from the same IP address to the client system, regardless which one was queried.

That's strange. This is an old bug that was fixed quite some time ago.

We keep track of the client IP: https://github.com/AdguardTeam/dnsproxy/blob/master/proxy/server_udp.go#L65 And then use it when the response is written: https://github.com/AdguardTeam/dnsproxy/blob/master/proxy/server_udp.go#L125

Are you sure there're no iptables rules or whatever that could mess with it?

pmhausen commented 3 years ago

We keep track of the client IP

It's the server's reply source IP that is wrong.

Are you sure there're no iptables rules or whatever that could mess with it?

100%. 1st this is FreeBSD. 2nd this is a local/private interface with just "permit all" set in the FreeBSD "pf" firewall and no NAT.

ameshkov commented 3 years ago

Okay, let me rephrase, not the client IP, but the net interface IP the packet was sent to (dst addr) from the socket's OOB data: https://github.com/AdguardTeam/dnsproxy/blob/1163404e605c3dfbeab360fc3540fc290f61a321/proxyutil/udp_unix.go#L47

I wonder if it's broken on FreeBSD somehow?

pmhausen commented 3 years ago

Or the API is different? I don't know but I'll go ask on the FreeBSD-net mailing list later today and report back.

pmhausen commented 3 years ago

https://lists.freebsd.org/pipermail/freebsd-net/2021-May/058303.html

ameshkov commented 3 years ago

From what I read it seems that it should work on FreeBSD, it uses IP_RECVDSTADDR internally as it's supposed to.

We could do a quick test and see how it goes, but I'll need your help with this since I don't have FreeBSD.

  1. Clone https://github.com/AdguardTeam/dnsproxy, branch testbsd: https://github.com/AdguardTeam/dnsproxy/tree/testbsd
  2. Build it: go build (you'll need to use go 1.15 or newer)
  3. Run it: ./dnsproxy -l 0.0.0.0 -p 53 -u 8.8.8.8
  4. Do some DNS queries to it and check what it prints to stdout. Post it here (it'll print the raw oob data)
pmhausen commented 3 years ago

I'll try your code today or tomorrow. Thanks. Peter Jeremy wrote on the freebsd-net list:

So, they say that they retrieve "the net interface IP the packet was sent to (dst addr) from the socket's OOB data" and I agree that's what the referenced code does. I hadn't heard of that behaviour before and went digging...

FreeBSD ip(4) documents that setting IPPROTO_IP.IP_RECVDSTADDR on a SOCK_DGRAM socket will allow recvmsg(2) to retrieve the local dst address as a control message. The code to retrieve that seems to be present, and there's code that supports setting IP_RECVDSTADDR, though I haven't dug through to find the relevant setsockopt(2) call.

I'm not sure how to debug that further without tracing through the Go code. You could try ktrace'ing the executable to verify that the relevant setsockopt() calls (though I don't think ktrace will report the actual flag state).

pmhausen commented 3 years ago

A query to the primary IP address of the interface:

dig @172.16.0.2 www.punkt.de
[...]
;; ANSWER SECTION:
www.punkt.de.       299 IN  A   217.29.40.141
2021/05/13 07:57:42 [info] Received OOB data
2021/05/13 07:57:42 [info] [36 0 0 0 41 0 0 0 46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 255 255 172 16 0 2 2 0 0 0 0 0 0 0]
2021/05/13 07:57:42 [info] Parsed dst IP: 172.16.0.2

A query to an alias IP address:

$ dig @172.16.0.3 www.punkt.de
;; reply from unexpected source: 172.16.0.2#53, expected 172.16.0.3#53
;; reply from unexpected source: 172.16.0.2#53, expected 172.16.0.3#53
2021/05/13 07:57:46 [info] Received OOB data
2021/05/13 07:57:46 [info] [36 0 0 0 41 0 0 0 46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 255 255 172 16 0 3 2 0 0 0 0 0 0 0]
2021/05/13 07:57:46 [info] Parsed dst IP: 172.16.0.3

I made 100% sure there are is no NAT interfering by globally disabling the firewall completely.

pmhausen commented 3 years ago

Some more comments from Peter:

As I see it, the possibilities boil down to: 1) The Go code isn't enabling IPPROTO_IP.IP_RECVDSTADDR on the socket. 2) There's a FreeBSD kernel bug that mean setting IP_RECVDSTADDR isn't being correctly reflected into the recvmsg control message. 3) The control message isn't being correctly plumbed through from recvmsg(2) to the Go RecvMsg() return.

Note that a lot of the relevant Go library code is BSD- or FreeBSD- specific so it's also possible that there is a bug in the Go library code.

ameshkov commented 3 years ago

@pmhausen could you please do the same test again? I've made some changes to that branch, added a little bit more logging and used a different way to correct the source address.

pmhausen commented 3 years ago
$ dig @172.16.0.2 www.punkt.de
[...]
www.punkt.de.       299 IN  A   217.29.40.141
[...]
$ dig @172.16.0.3 www.punkt.de
;; reply from unexpected source: 172.16.0.2#53, expected 172.16.0.3#53
;; reply from unexpected source: 172.16.0.2#53, expected 172.16.0.3#53
$ sudo ./dnsproxy -l 0.0.0.0 -p 53 -u 8.8.8.8
2021/05/14 15:39:54 [info] Starting the DNS proxy
2021/05/14 15:39:54 [info] Starting the DNS proxy server
2021/05/14 15:39:54 [info] Creating the UDP server socket
2021/05/14 15:39:54 [info] Listening to udp://[::]:53
2021/05/14 15:39:54 [info] Creating a TCP server socket
2021/05/14 15:39:54 [info] Listening to tcp://[::]:53
2021/05/14 15:39:54 [info] Entering the UDP listener loop on [::]:53
2021/05/14 15:39:54 [info] Entering the tcp listener loop on [::]:53
2021/05/14 15:40:03 [info] read from 172.16.0.1:65063
2021/05/14 15:40:03 [info] write to 172.16.0.1:65063
2021/05/14 15:40:08 [info] read from 172.16.0.1:60054
2021/05/14 15:40:08 [info] write to 172.16.0.1:60054
2021/05/14 15:40:13 [info] read from 172.16.0.1:60054
2021/05/14 15:40:13 [info] write to 172.16.0.1:60054
ameshkov commented 3 years ago

Hm, this looks like a golang issue after all, relevant discussion: https://github.com/golang/go/issues/8329

pmhausen commented 3 years ago

I don't quite get it. I observe:

root@OPNsense:~ # netstat -na|grep 53
tcp46      0      0 *.53                   *.*                    LISTEN     
udp46      0      0 *.53                   *.*                    

But why is that the case when passing -l 0.0.0.0 - semantically that is clearly an IPv4 INADDR_ANY, not v6. I would expect any tool to require an additional -l :: ...

pmhausen commented 3 years ago

I see now. You are using an IPv4-mapped IPv6 address. Since you have an AF_INET6 socket, you cannot use AF_INET options via setsockopt().

Peter Jeremy wrote:

On Sun, May 16, 2021 at 09:18:55PM +1000, Peter Jeremy via freebsd-net wrote: This is getting outside my expertise but my understanding is that the idea behind using IPv4-mapped addressed is to simplify building dual-stack applications, particularly during the early introduction of IPv6. The main benefit is that it made it possible to support both IPv4 and IPv6 without needing 2 sockets - which means you can stick to doing an accept() on a blocking socket, rather than needing to use poll() or select() etc with a pair of non-blocking sockets.

To which Lutz Donnerhacke added:

Correct. IPv4-mapped addresses exists only for this purpose, they do not have any meaning outside of this API. Unfortunatly the API is incomplete, but still heavily used. For this purpose the API might be stretched over its limits.

Without unnecessarily complicating the code - could you add a config file setting that forces AdGuard Home to use IPv4 only? That would be great in environments with CARP or alias addresses for other reasons.

ameshkov commented 3 years ago

@pmhausen interesting, thank you for this info. Let's do one more test with dnsproxy, I've added a check here: https://github.com/AdguardTeam/dnsproxy/commit/b946d008143caa50317d1a7c35fa09d8fa357681#diff-46c1ddbacc8c5b4f527bfc32976291cde674248df5f400d76a90c3864737c682R43

pmhausen commented 3 years ago
$ dig @172.16.0.2 www.punkt.de
[...]
;; ANSWER SECTION:
www.punkt.de.       299 IN  A   217.29.40.141

$ dig @172.16.0.3 www.punkt.de
;; reply from unexpected source: 172.16.0.2#53, expected 172.16.0.3#53
;; reply from unexpected source: 172.16.0.2#53, expected 172.16.0.3#53
$ sudo ./dnsproxy -l 0.0.0.0 -p 53 -u 8.8.8.8
2021/05/16 20:16:12 [info] Starting the DNS proxy
2021/05/16 20:16:12 [info] Starting the DNS proxy server
2021/05/16 20:16:12 [info] Creating the UDP server socket
2021/05/16 20:16:12 [info] Listening to udp://0.0.0.0:53
2021/05/16 20:16:12 [info] Creating a TCP server socket
2021/05/16 20:16:12 [info] Listening to tcp://[::]:53
2021/05/16 20:16:12 [info] Entering the UDP listener loop on 0.0.0.0:53
2021/05/16 20:16:12 [info] Entering the tcp listener loop on [::]:53
2021/05/16 20:16:23 [info] read from 172.16.0.1:53093
2021/05/16 20:16:23 [info] write to 172.16.0.1:53093
2021/05/16 20:16:30 [info] read from 172.16.0.1:51314
2021/05/16 20:16:30 [info] write to 172.16.0.1:51314
2021/05/16 20:16:35 [info] read from 172.16.0.1:51314
2021/05/16 20:16:35 [info] write to 172.16.0.1:51314
^C2021/05/16 20:17:26 [info] Stopping the DNS proxy server
ameshkov commented 3 years ago

Hm, so it seems the problem is not in IPv4-mapping after all :(

pmhausen commented 3 years ago

I attached a system call trace. Possibly helpful?

ktrace.txt

ameshkov commented 3 years ago

Well, I don't see IP_SENDSRCADDR in the trace so I guess that's the root cause of the issue.

We're successfully extracting the dst address and we pass it further, but it's simply omitted by golang. Looks like a bug, we'll need some time to prepare a minimal reproducer and file it to Golang repo.

pmhausen commented 3 years ago

Thank you so very much. Glad I could help to pin this somewhere in the FreeBSD - golang - AdGuard Home Bermuda triangle.

pmhausen commented 2 years ago

Hi folks,

just routinely checking my open issues. Is there any progress? Is there anything I can do to help?

Kind regards, Patrick

ameshkov commented 2 years ago

Sorry, nothing on our side, we still need to file a bug report to golang.

@ainar-g reassigned to you, I just don't have enough time to do this.

liv3010m commented 2 years ago

+1 Had this same issue yesterday when I installed AdGuardHome on my OPNsense box which uses IP alias instead of CARP address whose behavior is the same as @pmhausen had previously explained.

[root@prometheus ~]# nslookup
>
> servers
;; reply from unexpected source: 192.168.217.1#53, expected 192.168.217.3#53
;; reply from unexpected source: 192.168.217.1#53, expected 192.168.217.3#53

This is happening with: mimugmail repo's AdGuardHome version 0.107.2 (v0.107.0-b.9) and also when upgrading to 108.0-b.9. OPNsense 22.1.8_1-amd64 (FreeBSD 13.0-STABLE)

Hopefully this gets sorted out. :)

emaba commented 1 year ago

Is there any progress in solving this issue? Thank you.

hanjo commented 9 months ago

I did not have this issue until today v0.107.42 was released (and automatically updated). My entire network went down :-) Since Go was mentioned before in this issue and the release notes mention a new Go version, I'm suspecting this is related. Anyway, I'm now stuck with v0.107.41 until this is resolved.

hanjo commented 9 months ago

I did not have this issue until today v0.107.42 was released (and automatically updated). My entire network went down :-) Since Go was mentioned before in this issue and the release notes mention a new Go version, I'm suspecting this is related. Anyway, I'm now stuck with v0.107.41 until this is resolved.

The issue seems to be resolved in v0.107.43 for me 👍

mimugmail commented 9 months ago

Ok, and when you remove the plugin ans install again with the old version everything is fine again?

hanjo commented 9 months ago

The issue seems to be in v0.107.43 for me 👍

My apologies, I forgot an important word above:

The issue seems to be resolved in v0.107.43 for me 👍

I've updated my comment above to avoid further irritation.