getdnsapi / getdns

A modern asynchronous DNS API https://getdnsapi.net/
Other
457 stars 126 forks source link

Packet size limit exceeded when forwarding back to UDP client from TCP/TLS server #430

Open harjoc opened 5 years ago

harjoc commented 5 years ago

When stubby receives a request from an UDP client without an edns0 option, and it forwards it via TLS or plain TCP, if the reply is larger than 512 bytes, stubby will forward the large reply back to the client which will reject the reply.

This happens when stubby is used on openwrt routers for example, since /etc/resolv.conf does not contain "options edns0", so the packet limit is 512 bytes:

https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/include/arpa/nameser.h#n77

Using name compression on the reply would alleviate this issue since most TLS-DNS servers get the reply from an authoritative server which fits in 512 bytes.

I can work on adding name compression in stubby, but does the getdns API provide a way to implement this ?

foxcpp commented 4 years ago

I don't think name compression is going to solve this completely. Some responses are just bigger than 512 bytes (see example below).

stubby/getdns should truncate packets to buffer size advertised by client or 512 bytes if EDNS is not supported by client. Ref.: https://tools.ietf.org/html/rfc5625#section-4.4

Here is the example that caused problems in my case (non-EDNS-capable client was confused by large response):

⟩ dig +do +noedns TXT mail._domainkey.disroot.org

; <<>> DiG 9.14.6 <<>> @10.8.0.1 +do +noedns TXT mail._domainkey.disroot.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48386
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1452
;; QUESTION SECTION:
;mail._domainkey.disroot.org.   IN  TXT

;; ANSWER SECTION:
mail._domainkey.disroot.org. 852 IN TXT "v=DKIM1; g=*; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAmmx1FS5zycLSUYBmcGT3EWD84Uq+5MfwtxkYDAWcKDKhVhfV1vh0MA7oRZKvWEXPKj9qm1vSDNR/oB0ZSvKNqptCIpKwLzzrglqqZRCenP/xof07N6LX31Gl5pWM5uT8PkeqxFpP1SoRX3crz7VHjokg8qzIA6aNkzus+XE1v4/SDA41/Odsd" "6zLqr1XJR3AGIpr1Ky+d78QSw3iYkK83BpFC9Zau/Wza/BRIwMxm7VvRikJwosGrPrx2at4igwHOqRoRRtJhwLyhQJTx6haLjz/E6Ss8/YR4CzhO+GRllgi2puS3a3qz7u51wWRcK5QB+VrY+5DtZvNtKdk4yefbwIDAQAB"
mail._domainkey.disroot.org. 852 IN RRSIG   TXT 13 4 3600 20191031145113 20191017132113 36238 disroot.org. tID6JWOgBB4nFiR6d8xAgMP2tzQ2FMtFRkM6vW55P02phPkC706TNS50 V+h/kcyofxyI4QRoPnYaVvNk75+CLQ==
mail._domainkey.disroot.org. 852 IN RRSIG   TXT 8 4 3600 20191031145113 20191017132113 51556 disroot.org. wHfhhHL+BhL3zDBv6puqr+XVUt97htX1SV0HZSvGw/VOGaPT/PEXqNUY Li9RQwUJ4imUF84FBp58/ih0rLCo4RrL5Jmkx+fvg5PGKDIamNzRDifo Jkd/zbiihn/cDItydNEg5hqLzK+c0QSTOac1/lvClXx/cel2WNSNfcrr Ric=

;; Query time: 114 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Oct 17 19:57:39 MSK 2019
;; MSG SIZE  rcvd: 844
wtoorop commented 4 years ago

Completely agree... We'll have to record client EDNS buffer size and send packet if it fits it, or reply truncated.

wtoorop commented 4 years ago

Closing because it's in getdns-1.6.0

epilsits commented 4 years ago

I've observed that when truncation happens the response does not contain any answer records. Is this to be expected?

Myself and several other users have experienced problems with IoT devices (nest protect and SmartThings) after this patch. My experience was that my nest protect's were unable to connect to the Nest network and I could not add new devices to my account. I ran tcpdump on my router (rt-ax88u running merlin's asuswrt with DoT enabled) and saw the response from stubby had the TC bit set but the answer section was empty. Turning off DoT allowed the devices to connect again.

The request was for frontdoor.nest.com and the response is 792 bytes, using cloudflare's DoT servers. The nest does not seem to be EDNS capable, as the request did not have the EDNS OPT section, nor does it retry the query in TCP mode. The truncated response was something like ~32 bytes.

I'm having a hard time reconciling the RFC recommendation here, seeming to recommend not truncating the response. Since UDP packets larger than 512 octets are now expected in normal operation, proxies SHOULD NOT truncate UDP packets that exceed that size. See Section 4.4.3 for recommendations for packet sizes exceeding the WAN MTU.

Is there any middle ground here? Perhaps a set of configurations? Or is there something wrong with the patch, and there actually should be something in the truncated answer section?

vttale commented 4 years ago

I've observed that when truncation happens the response does not contain any answer records. Is this to be expected?

It can happen, yes, even beyond the far most common case of a delegation being provided (which only has records in the authority section and maybe additional section). If the full rrset that would go in the answer section, plus the header, exceeds 512 bytes, then a standards compliant server would remove the whole rrset and set the truncate bit (tc). Nameservers can also optionally remove everything (but the header) and set tc anyway; this is how response ratelimiting works, for example.

The request was for frontdoor.nest.com and the response is 792 bytes, using cloudflare's DoT servers.

Hmm, I'm curious about what the answer is expected to be in the absence of the problem at hand. From my location when I ask for frontdoor.nest.com at the nest.com servers I get back a perfectly reasonable response of one CNAME; when fulfilling the chain then even including the 8 A records it still all fits in 512 just fine.

Taking getdns out of the picture, what are the Cloudflare servers saying?

I'm having a hard time reconciling the RFC recommendation here

RFC 5625 was concerning itself with the issue of proxies (and firewalls) that would drop DNS/UDP packets over 512 bytes while completely oblivious to EDNS and larger DNS/UDP packets. If the Nest is not asking with EDNS then a server needs to truncate at 512.

epilsits commented 4 years ago

Here's what I get locally. This is on an asus RT-AX88U running Merlin's asuswrt with DoT enabled (dnsmasq > stubby) using cloudflare's DoT servers.

dig frontdoor.nest.com.

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Raspbian <<>> frontdoor.nest.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2476
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;frontdoor.nest.com.            IN      A

;; ANSWER SECTION:
frontdoor.nest.com.     116     IN      CNAME   frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com.
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 54.209.71.245
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 3.94.154.216
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 3.225.66.208
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 54.208.76.50
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 54.80.161.160
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 54.80.15.227
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 3.94.235.134
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 56 IN A 3.224.164.129

;; Query time: 72 msec
;; SERVER: 192.xxx.xxx.1#53(192.xxx.xxx.1)
;; WHEN: Wed Jul 15 18:07:54 CDT 2020
;; MSG SIZE  rcvd: 792

And dnsmasq cached response.

dig frontdoor.nest.com.

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Raspbian <<>> frontdoor.nest.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36680
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1280
;; QUESTION SECTION:
;frontdoor.nest.com.            IN      A

;; ANSWER SECTION:
frontdoor.nest.com.     97      IN      CNAME   frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com.
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 34.206.126.32
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 3.94.235.134
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 3.225.66.208
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 34.205.12.224
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 3.92.110.168
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 54.208.76.50
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 54.80.15.227
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 37 IN A 54.80.161.160

;; Query time: 0 msec
;; SERVER: 192.xxx.xxx.1#53(192.xxx.xxx.1)
;; WHEN: Wed Jul 15 18:10:55 CDT 2020
;; MSG SIZE  rcvd: 254

And cloudflare direct.

dig frontdoor.nest.com. @1.1.1.1

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Raspbian <<>> frontdoor.nest.com. @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53627
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;frontdoor.nest.com.            IN      A

;; ANSWER SECTION:
frontdoor.nest.com.     78      IN      CNAME   frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com.
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 54.80.161.160
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 54.80.15.227
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 3.225.66.208
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 3.94.235.134
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 54.209.71.245
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 54.208.76.50
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 3.94.154.216
frontdoor-srt01-production-1909587911.us-east-1.elb.amazonaws.com. 18 IN A 3.224.164.129

;; Query time: 15 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Jul 15 18:08:42 CDT 2020
;; MSG SIZE  rcvd: 269
epilsits commented 4 years ago

Here's a capture I had while debugging. This is the response from stubby -> dnsmasq.

image

vttale commented 4 years ago

Interesting, that is indeed a similar problem as mentioned in the opening of this issue, the lack of name compression causing the response to grow much larger.

vttale commented 4 years ago

Willem, what's the rationale for choosing to not use compression? I understand well that name compression would not address every response; of course some need tc. But why is it not being used in general?

epilsits commented 3 years ago

Any additional thoughts here? It leaves me and other users in a tough spot where continuing to use DoT via stubby knocks a bunch of IoT devices offline. The original issue is also valid, but clearly there are a whole class of devices that also don't handle the "correct" behavior.

wtoorop commented 3 years ago

sorry, I was on vacation. I'll have a look soonish.

wtoorop commented 3 years ago

Okay, reading everything above I think we should add:

epilsits commented 3 years ago

I think that would be very helpful.

mavack commented 3 years ago

Have tested this update on openwrt, i found personally that many office365 domains with their CNAME records would result in our F5 big-ip client not understanding the replies when > 512 even thou perfectly valid replies. Running current getdns-dev seams to have resolved that.