CZ-NIC / knot-resolver

Knot Resolver - resolve DNS names like it's 2024
https://www.knot-resolver.cz/
Other
364 stars 59 forks source link

cache prefers parent-side TTL to authoritative #34

Closed vcunat closed 7 years ago

vcunat commented 8 years ago

Reported on ML.

Knot Resolver seems to cache the TTL of parent zone (= delegation) instead of the TTL which is in the zone itself.

dig +noall +auth @n.ns.at univie.ac.at ns 
univie.ac.at.       10800   IN  NS  ns3.univie.ac.at.
univie.ac.at.       10800   IN  NS  ns4.univie.ac.at.
univie.ac.at.       10800   IN  NS  ns5.univie.ac.at.
univie.ac.at.       10800   IN  NS  ns7.univie.ac.at.
univie.ac.at.       10800   IN  NS  ns8.univie.ac.at.
univie.ac.at.       10800   IN  NS  ns10.univie.ac.at.
dig +noall +ans @ns10.univie.ac.at univie.ac.at ns
univie.ac.at.       600 IN  NS  ns7.univie.ac.at.
univie.ac.at.       600 IN  NS  ns4.univie.ac.at.
univie.ac.at.       600 IN  NS  ns8.univie.ac.at.
univie.ac.at.       600 IN  NS  ns3.univie.ac.at.
univie.ac.at.       600 IN  NS  ns5.univie.ac.at.
univie.ac.at.       600 IN  NS  ns10.univie.ac.at.
dig +noall +ans @ns10.univie.ac.at ns10.univie.ac.at a
ns10.univie.ac.at.  600 IN  A   192.76.243.2

Knot Resolver is caching 10800 instead of 600:

dig +noall +answer @127.0.0.1 ns10.univie.ac.at a
ns10.univie.ac.at.  10634   IN  A   192.76.243.2

Bind, unbound and pdns-recursor cache authoritative TTL (600).

oerdnj commented 8 years ago

There has been discussion at DNS-OARC brought by Fujiwara-san about parent-centric behaviour, and there's a draft now: https://tools.ietf.org/html/draft-fujiwara-dnsop-resolver-update-00

I was initially opposed to the idea, because I think it will just create more craziness in the DNS, but I am still in the phase of investigating how much will things break.

Nominum is also parent-centric for several years now without any significant breakages, and this would make the resolver job much more deterministic.

vcunat commented 8 years ago

Is there some reference to use cases for the parent-side and child-side records being different? The child-side TTLs tend to be lower but what is it good for? Otherwise I see only https://tools.ietf.org/html/draft-fujiwara-dnsop-resolver-update-00#ref-DUAN2012GHOST

oerdnj commented 8 years ago

The usual (old) approach would be to use the parent NS records only for bootstrapping and replace those with child-side NS records.

That way you can quickly test new deployed NS before pushing the change into parent. Or modify the TTL in case you plan to move your domain between DNS providers.

Also see Stephane's comment here: https://mailarchive.ietf.org/arch/msg/dnsop/1ESN4Uo8BRGrkadS-YXFR3z_cx0 and my here: https://mailarchive.ietf.org/arch/msg/dnsop/43WBzmkVoFWkWdSCCJ42nyQuY1k

oerdnj commented 8 years ago

This basically calls for having two separate caches - a delegation and record cache. And we can have an option whether the records from record cache should overwrite matching records in the delegation cache?

Would it be feasible?

vcunat commented 8 years ago

Hmm, currently resolver does prefer child-side record if in strict mode:

mode('strict')

EDIT: I moved the SERFAIL problems to a separate issue.

vcunat commented 8 years ago

Separate caches: we already use a few cache namespaces, so it's certainly feasible, and I believe the caching part is relatively cheap here (performance-wise). The client-subnet changes for cache were much more complicated than this.

Adding another round-trip for "verifying" every cut isn't very nice for latency on cold cache, especially as it's common to set the child-side TTL relatively low (a few minutes).

oerdnj commented 8 years ago

I don't think the extra roundtrip is needed - you can just use the AUTHORITY section info when you receive it "by accident". E.g. make it an opportunistic update.

vavrusa commented 8 years ago

We already support cache entry ranking, so data from authoritative answer will override data from non-authoritative answer. It's just that in default mode resolver isn't trying hard enough to ask child side about the NS set.

vcunat commented 8 years ago

Yes, but Fujiwara proposes AFAIK to do lookups based on parent-side NS/glue records, i.e. "as if" with empty cache, and use the child-side only to answer direct NS queries. That is, we would want to cache both at once.

vavrusa commented 8 years ago

Yeah, sure it makes sense. I think it shouldn't use data with non-auth rank unless it at least tries once to fetch data with auth rank.

stasic commented 8 years ago

Just for the record. Besides Fujiwara's draft and beside bind, unbound and pdns-resolver even not opensource implementations of resolver software use authoritative (600) data: google, opendns (cisco), dyn, ultradns: dig @google-public-dns-a.google.com +noall +answer ns10.univie.ac.at aaaa ns10.univie.ac.at. 238 IN AAAA 2001:67c:133c::2 dig @resolver1.opendns.com +noall +answer ns10.univie.ac.at aaaa ns10.univie.ac.at. 371 IN AAAA 2001:67c:133c::2 dig @rdns2.ultradns.net +noall +answer ns10.univie.ac.at aaaa ns10.univie.ac.at. 417 IN AAAA 2001:67c:133c::2 dig @resolver1.dyndnsinternetguide.com +noall +answer ns10.univie.ac.at aaaa ns10.univie.ac.at. 285 IN AAAA 2001:67c:133c::2

And even the newcomer systemd-resolved caches 600 TTL dig @127.0.0.53 +noall +answer ns10.univie.ac.at aaaa ns10.univie.ac.at. 524 IN AAAA 2001:67c:133c::2

vcunat commented 7 years ago

This should be OK now after https://gitlab.labs.nic.cz/knot/resolver/merge_requests/269 (expected in future 1.3.0)