AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
24.77k stars 1.79k forks source link

More fine-grained cache controls #4373

Open pedrolamas opened 2 years ago

pedrolamas commented 2 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue. YOU MAY DELETE THE PREREQUISITES SECTION.

Issue Details

`./AdGuardHome -v --version` output: ```txt AdGuard Home Version: v0.107.5 Channel: release Go version: go1.16.15 Build time: 2022-03-04T13:00:44Z+0000 GOOS: linux GOARCH: arm64 Race: false Dependencies: github.com/AdguardTeam/dnsproxy@v0.40.7-0.20220207171519-b3947de6a902 (sum: h1:6pxvSWL9tVelFo0R3t6Pn8u6YU5dCqTVehvNnP6lOqI=) github.com/AdguardTeam/golibs@v0.10.4 (sum: h1:TMBkablZC0IZOpRgg9fzAKlxxNhSN2YJq7qbgtuZ7PQ=) github.com/AdguardTeam/urlfilter@v0.15.2 (sum: h1:LZGgrm4l4Ys9eAqB+UUmZfiC6vHlDlYFhx0WXqo6LtQ=) github.com/NYTimes/gziphandler@v1.1.1 (sum: h1:ZUDjpQae29j0ryrS0u/B8HZfJBtBQHjqw2rQ2cqUQ3I=) github.com/aead/chacha20@v0.0.0-20180709150244-8b13a72661da (sum: h1:KjTM2ks9d14ZYCvmHS9iAKVt9AyzRSqNU1qabPih5BY=) github.com/aead/poly1305@v0.0.0-20180717145839-3fee0db0b635 (sum: h1:52m0LGchQBBVqJRyYYufQuIbVqRawmubW3OFGqK1ekw=) github.com/ameshkov/dnscrypt/v2@v2.2.3 (sum: h1:X9UP5AHtwp46Ji+sGFfF/1Is6OPI/SjxLqhKpx0P5UI=) github.com/ameshkov/dnsstamps@v1.0.3 (sum: h1:Srzik+J9mivH1alRACTbys2xOxs0lRH9qnTA7Y1OYVo=) github.com/beefsack/go-rate@v0.0.0-20200827232406-6cde80facd47 (sum: h1:M57m0xQqZIhx7CEJgeLSvRFKEK1RjzRuIXiA3HfYU7g=) github.com/cheekybits/genny@v1.0.0 (sum: h1:uGGa4nei+j20rOSeDeP5Of12XVm7TGUd4dJA9RDitfE=) github.com/digineo/go-ipset/v2@v2.2.1 (sum: h1:k6skY+0fMqeUjjeWO/m5OuWPSZUAn7AucHMnQ1MX77g=) github.com/fsnotify/fsnotify@v1.5.1 (sum: h1:mZcQUHVQUQWoPXXtuf9yuEXKudkV2sx1E06UadKWpgI=) github.com/go-ping/ping@v0.0.0-20210506233800-ff8be3320020 (sum: h1:mdi6AbCEoKCA1xKCmp7UtRB5fvGFlP92PvlhxgdvXEw=) github.com/google/go-cmp@v0.5.5 (sum: h1:Khx7svrCpmxxtHBq5j2mp/xVjsi8hQMfNLvJFAlrGgU=) github.com/google/gopacket@v1.1.19 (sum: h1:ves8RnFZPGiFnTS0uPQStjwru6uO6h+nlr9j6fL7kF8=) github.com/google/renameio@v1.0.1 (sum: h1:Lh/jXZmvZxb0BBeSY5VKEfidcbcbenKjZFzM/q0fSeU=) github.com/insomniacslk/dhcp@v0.0.0-20210310193751-cfd4d47082c2 (sum: h1:NpTIlXznCStsY88jU+Gh1Dy5dt/jYV4z4uU8h2TUOt4=) github.com/josharian/native@v0.0.0-20200817173448-b6b71def0850 (sum: h1:uhL5Gw7BINiiPAo24A2sxkcDI0Jt/sqp1v5xQCniEFA=) github.com/kardianos/service@v1.2.0 (sum: h1:bGuZ/epo3vrt8IPC7mnKQolqFeYJb7Cs8Rk4PSOBB/g=) github.com/lucas-clemente/quic-go@v0.24.0 (sum: h1:ToR7SIIEdrgOhgVTHvPgdVRJfgVy+N0wQAagH7L4d5g=) github.com/marten-seemann/qtls-go1-16@v0.1.4 (sum: h1:xbHbOGGhrenVtII6Co8akhLEdrawwB2iHl5yhJRpnco=) github.com/mdlayher/ethernet@v0.0.0-20190606142754-0394541c37b7 (sum: h1:lez6TS6aAau+8wXUP3G9I3TGlmPFEq2CTxBaRqY6AGE=) github.com/mdlayher/netlink@v1.4.0 (sum: h1:n3ARR+Fm0dDv37dj5wSWZXDKcy+U0zwcXS3zKMnSiT0=) github.com/mdlayher/raw@v0.0.0-20210412142147-51b895745faf (sum: h1:InctQoB89TIkmgIFQeIL4KXNvWc1iebQXdZggqPSwL8=) github.com/miekg/dns@v1.1.45 (sum: h1:g5fRIhm9nx7g8osrAvgb16QJfmyMsyOCb+J7LSv+Qzk=) github.com/patrickmn/go-cache@v2.1.0+incompatible (sum: h1:HRMgzkcYKYpi3C8ajMPV8OFXaaRUnok+kx1WdO15EQc=) github.com/pkg/errors@v0.9.1 (sum: h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=) github.com/satori/go.uuid@v1.2.0 (sum: h1:0uYX9dsZ2yD7q2RtLRtPSdGDWzjeM3TbMJP9utgA0ww=) github.com/ti-mo/netfilter@v0.4.0 (sum: h1:rTN1nBYULDmMfDeBHZpKuNKX/bWEXQUhe02a/10orzg=) github.com/u-root/u-root@v7.0.0+incompatible (sum: h1:u+KSS04pSxJGI5E7WE4Bs9+Zd75QjFv+REkjy/aoAc8=) go.etcd.io/bbolt@v1.3.6 (sum: h1:/ecaJf0sk1l4l6V4awd65v2C3ILy7MSj+s/x1ADCIMU=) golang.org/x/crypto@v0.0.0-20211215153901-e495a2d5b3d3 (sum: h1:0es+/5331RGQPcXlMfP+WrnIIS6dNnNRe0WB02W0F4M=) golang.org/x/net@v0.0.0-20211216030914-fe4d6282115f (sum: h1:hEYJvxw1lSnWIl8X9ofsYMklzaDs90JI2az5YMd4fPM=) golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c (sum: h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ=) golang.org/x/sys@v0.0.0-20211216021012-1d35b9e2eb4e (sum: h1:fLOSk5Q00efkSvAm+4xcoXD+RRmLmmulPn5I3Y9F2EM=) golang.org/x/text@v0.3.7 (sum: h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=) gopkg.in/natefinch/lumberjack.v2@v2.0.0 (sum: h1:1Lc07Kr7qY4U2YPouBjpCLxpiyxIVoxqXgkXLknAOE8=) gopkg.in/yaml.v2@v2.4.0 (sum: h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=) howett.net/plist@v0.0.0-20201203080718-1454fab16a06 (sum: h1:QDxUo/w2COstK1wIBYpzQlHX/NqaQTcf9jyz347nI58=) ```

Expected Behavior

When I ask AdGuard Home (ADH) for a local host, it will correctly forward that to the Upstream DNS Servers (my UDM Pro in this case)

If the host does not exist, it returns that information and caches that the host does not exist with a low TTL.

Next request should use the cached value, until entry expires.

Actual Behavior

TTL for a failed resolution seems to be 3600s!

Screenshots

Screenshot: ![image](https://user-images.githubusercontent.com/85504/157087689-91222d82-e0c8-41e5-9c96-51a01f5b2a5f.png) - on 1 you see the initial request for the non-existing local host - on 2 and above, all logged entries show "served from cache" - on 3 (more than 1h later!) I see a new entry with no indication of cache - on 4 and after, all logged entries show "served from cache"

Discussed in https://github.com/AdguardTeam/AdGuardHome/discussions/4369

fernvenue commented 2 years ago

Agreed, NXDOMAIN needs a low TTL. And I just tested and found that even when optimistic caching enabled, AdGuardHome will not query NXDOMAIN anymore, I think it would be great to apply optimistic caching for NXDOMAIN, return from the cache, send a new request to the upstream at the same time, and then update the result in the cache.

fernvenue commented 2 years ago

Unfortunately, it took longer than we thought. Perhaps it follows the TTL value returned by the upstream? Google returned "TTL":86398 for c.lan and now AdGuardHome has been returning the cached value for over an hour :(

Screenshot 2022-03-08 at 18-05-52 AdGuard Home

pedrolamas commented 2 years ago

Ah, that probably explains why I've experienced over 24h cache for non-existing domains... I thought I was dreaming but now I see I was not!

agneevX commented 2 years ago

@pedrolamas are you receiving NXDOMAIN responses the second time around?

NXDOMAIN needs a low TTL. And I just tested and found that even when optimistic caching enabled, AdGuardHome will not query NXDOMAIN anymore, I think it would be great to apply optimistic caching for NXDOMAIN

NXDOMAIN responses do not have an explicit TTL but it does usually specify a value of 86400 in the data field.

NXDOMAIN is not SERVFAIL, because the server indicates that there's nothing below here, hence it's intentional.

Negative answers should neither be cached for too long or come under optimistic caching IMO.

fernvenue commented 2 years ago

@pedrolamas are you receiving NXDOMAIN responses the second time around?

NXDOMAIN needs a low TTL. And I just tested and found that even when optimistic caching enabled, AdGuardHome will not query NXDOMAIN anymore, I think it would be great to apply optimistic caching for NXDOMAIN

NXDOMAIN responses do not have an explicit TTL but it does usually specify a value of 86400 in the data field.

NXDOMAIN is not SERVFAIL, because the server indicates that there's nothing below here, hence it's intentional.

Negative answers should neither be cached for too long or come under optimistic caching IMO.

So maybe AdGuardHome should set a low TTL for NXDOMAIN?

agneevX commented 2 years ago

Ideally, not caching negative answers is the best way forward. Negative being NXDOMAIN only, not 0.0.0.0.

EDIT: SERVFAIL responses are negative too, but AdGuard does not cache them, so we're good there.

pedrolamas commented 2 years ago

This is what my UDM Pro returns for a non-existing host:

image

This is what I get it I use the FQDN of a non-existing host:

image

For the record, I bought a domain a couple of years ago (let's assume my-domain.com) and the local domain set in DHCP is local.my-domain.com, hence why it's going to Cloudflare (where my-domain.com is hosted)

If I am reading this correctly, both cases seem to return NOERROR instead of NXDOMAIN!

Now let's do 2 queries for different non-existing hosts to ADH:

image

From the ADH logs I can see that those were NOT cached, so that is good!

Now let's do the same with an FQDN of a different non-existing host:

image

ADH logs here show that 2nd request WAS served from cache, and that is what I believe to be incorrect!!

image

agneevX commented 2 years ago

Your UDMP should be returning NXDOMAIN for those responses.

Screenshot (198)

pedrolamas commented 2 years ago

Your UDMP should be returning NXDOMAIN for those responses.

Sure, to be honest, that is what I would expect! Not sure how one can configure it to do that though...

agneevX commented 2 years ago

What upstreams are you using there? Your AGH instance also uses the UDMP as the upstream, per the screenshots?

pedrolamas commented 2 years ago

What upstreams are you using there? Your AGH instance also uses the UDMP as the upstream, per the screenshots?

I do use the UDM Pro for upstream of the local domain, as the UDM Pro is the DHCP and has the knowledge of the hostnames.

Having said that and after thinking this over, NXDOMAIN does feel wrong as the domain DOES EXIST, it's the host that does not exist!

If I ask Cloudflare for non-existing-host.my-domain.com and my-domain.com does exist, it will not return NXDOMAIN which I think it is correct (and thus the same behavior I am seeing here)

agneevX commented 2 years ago

Your example was notrealhost8. That obviously does not exist.

pedrolamas commented 2 years ago

I gave both examples on the screenshots, for a naked host and one for a FQDN.

Naked hosts like notrealhost8 will work correctly (no cache), but a host in a FQDN format seems to be cached even when it does not exist.

EugeneOne1 commented 2 years ago

@pedrolamas, hello and sorry for a late response. As per RFC 2308 Section 5:

As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL.

A negative answer that resulted from a name error (NXDOMAIN) should be cached \<...>

A negative answer that resulted from a no data error (NODATA) should be cached \<...>

It seems the responses for FQDN requests are containing SOA records making those valid cacheable negative answers. As the same RFC's section 8 says:

Negative caching in resolvers is no-longer optional, if a resolver caches anything it must also cache negative answers.

pedrolamas commented 2 years ago

Thank you for following up on this, @EugeneOne1.

Indeed, this behavior makes complete sense in terms of the RFC specification.

And yes, as I am using a subdomain of a real domain (registered and hosted in Cloudflare), the upstream SOA is being returned when the UDM Pro doesn't have a local host for the query.

I guess not using the FQDN for local hosts mitigates the issue for me, however I don't think that is ideal.

I'm reading the RFC and thinking this through, but not sure if there is any viable alternative here...

pedrolamas commented 2 years ago

Looking back at this again, I had some ideas of how one could fix this:

IMHO, any of these solution should in theory mitigate this problem... would this be something acceptable for AdGuard Home?

gshpychka commented 1 year ago

@pedrolamas, hello and sorry for a late response. As per RFC 2308 Section 5:

As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL.

A negative answer that resulted from a name error (NXDOMAIN) should be cached <...>

A negative answer that resulted from a no data error (NODATA) should be cached <...>

It seems the responses for FQDN requests are containing SOA records making those valid cacheable negative answers. As the same RFC's section 8 says:

Negative caching in resolvers is no-longer optional, if a resolver caches anything it must also cache negative answers.

SERVFAIL is cached for me:

image