NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.06k stars 349 forks source link

edns-buffer-size problem prevents unbound-host from getting response #106

Open FGasper opened 4 years ago

FGasper commented 4 years ago

On a server I have access to I’ve noticed that unbound-host -t NS . fails. When I strace the command I see that it’s sending a bunch of UDP queries to the root nameservers but receiving no replies.

When I create myunbound.conf:

server:
    edns-buffer-size: 512

… and run unbound-host -d -C myunbound.conf -t NS ., then I get the expected results.

While it’s reasonable that the EDNS buffer size would need to be adjusted for a UDP response, it seems like I shouldn’t have to do that in order to get any response, should I? If a certain time passes and no UDP queries have succeeded, shouldn’t libunbound fall back to TCP?

And, moreover, would it also make sense to send multiple UDP queries concurrently: one with a EDNS=512, another with EDNS=1472, etc.? That way if, e.g., 4096 gets no response but 1472 and 512 do, and 512 is truncated but 1472 isn’t, we can still have the UDP response without having had to do TCP?

ralphdolmans commented 4 years ago

If a certain time passes and no UDP queries have succeeded, shouldn’t libunbound fall back to TCP?

Unbound only queries over TCP when instructed to do so, ie TC bit received. Unless you configure Unbound to always use TCP or TLS.

And, moreover, would it also make sense to send multiple UDP queries concurrently: one with a EDNS=512, another with EDNS=1472, etc.? That way if, e.g., 4096 gets no response but 1472 and 512 do, and 512 is truncated but 1472 isn’t, we can still have the UDP response without having had to do TCP?

Unbound already falls back to 1472 for IPv4 and 1232 for IPv6 when resolving using a bigger buffer size fails. These fall-back values are chosen to prevent fragmentation, which in some cases can lead to resolution failures.

FGasper commented 4 years ago

@ralphdolmans Thank you for your response.

Would it be reasonable, then, to implement a fallback to 512 if 1472/1232 fails? We actually are seeing cases where 1472 is too big. Unbound will fall back to TCP if the UDP/512 query is fragmented, right?

ralphdolmans commented 4 years ago

Would it be reasonable, then, to implement a fallback to 512 if 1472/1232 fails?

I would prefer to not add an extra fallback for this. All these fallbacks make the code more complicated and generate more traffic towards unresponsive servers. This fallback is in place to work around fragmentation issues, which usually does not happen on 1472/1232. I recommend to debug and fix this network issue instead of working around it in Unbound. Maybe you have an overly aggressive firewall on your path?

Unbound will fall back to TCP if the UDP/512 query is fragmented, right?

The root NS set + signature won't fit in a 512 answer, so the server will reply with the truncation bit set. This will indeed trigger Unbound to retry over TCP.

FGasper commented 4 years ago

Alas, we don’t always control firewall settings, and a fair number of servers where we run libunbound apparently drop UDP responses whose sizes exceed 512 bytes.

It would be great to have an option to fall back to 512 and/or TCP. Failing that, we’ll see about some sort of application-level detection of this issue and force the EDNS max to 512 accordingly.