Open pedrolamas opened 2 years ago
Agreed, NXDOMAIN needs a low TTL. And I just tested and found that even when optimistic caching enabled, AdGuardHome will not query NXDOMAIN anymore, I think it would be great to apply optimistic caching for NXDOMAIN, return from the cache, send a new request to the upstream at the same time, and then update the result in the cache.
Unfortunately, it took longer than we thought. Perhaps it follows the TTL value returned by the upstream? Google returned "TTL":86398
for c.lan
and now AdGuardHome has been returning the cached value for over an hour :(
Ah, that probably explains why I've experienced over 24h cache for non-existing domains... I thought I was dreaming but now I see I was not!
@pedrolamas are you receiving NXDOMAIN responses the second time around?
NXDOMAIN needs a low TTL. And I just tested and found that even when optimistic caching enabled, AdGuardHome will not query NXDOMAIN anymore, I think it would be great to apply optimistic caching for NXDOMAIN
NXDOMAIN responses do not have an explicit TTL but it does usually specify a value of 86400 in the data field.
NXDOMAIN is not SERVFAIL, because the server indicates that there's nothing below here, hence it's intentional.
Negative answers should neither be cached for too long or come under optimistic caching IMO.
@pedrolamas are you receiving NXDOMAIN responses the second time around?
NXDOMAIN needs a low TTL. And I just tested and found that even when optimistic caching enabled, AdGuardHome will not query NXDOMAIN anymore, I think it would be great to apply optimistic caching for NXDOMAIN
NXDOMAIN responses do not have an explicit TTL but it does usually specify a value of 86400 in the data field.
NXDOMAIN is not SERVFAIL, because the server indicates that there's nothing below here, hence it's intentional.
Negative answers should neither be cached for too long or come under optimistic caching IMO.
So maybe AdGuardHome should set a low TTL for NXDOMAIN?
Ideally, not caching negative answers is the best way forward. Negative being NXDOMAIN only, not 0.0.0.0.
EDIT: SERVFAIL responses are negative too, but AdGuard does not cache them, so we're good there.
This is what my UDM Pro returns for a non-existing host:
This is what I get it I use the FQDN of a non-existing host:
For the record, I bought a domain a couple of years ago (let's assume my-domain.com
) and the local domain set in DHCP is local.my-domain.com
, hence why it's going to Cloudflare (where my-domain.com
is hosted)
If I am reading this correctly, both cases seem to return NOERROR instead of NXDOMAIN!
Now let's do 2 queries for different non-existing hosts to ADH:
From the ADH logs I can see that those were NOT cached, so that is good!
Now let's do the same with an FQDN of a different non-existing host:
ADH logs here show that 2nd request WAS served from cache, and that is what I believe to be incorrect!!
Your UDMP should be returning NXDOMAIN for those responses.
Your UDMP should be returning NXDOMAIN for those responses.
Sure, to be honest, that is what I would expect! Not sure how one can configure it to do that though...
What upstreams are you using there? Your AGH instance also uses the UDMP as the upstream, per the screenshots?
What upstreams are you using there? Your AGH instance also uses the UDMP as the upstream, per the screenshots?
I do use the UDM Pro for upstream of the local domain, as the UDM Pro is the DHCP and has the knowledge of the hostnames.
Having said that and after thinking this over, NXDOMAIN does feel wrong as the domain DOES EXIST, it's the host that does not exist!
If I ask Cloudflare for non-existing-host.my-domain.com
and my-domain.com
does exist, it will not return NXDOMAIN which I think it is correct (and thus the same behavior I am seeing here)
Your example was notrealhost8
. That obviously does not exist.
I gave both examples on the screenshots, for a naked host and one for a FQDN.
Naked hosts like notrealhost8
will work correctly (no cache), but a host in a FQDN format seems to be cached even when it does not exist.
@pedrolamas, hello and sorry for a late response. As per RFC 2308 Section 5:
As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL.
A negative answer that resulted from a name error (NXDOMAIN) should be cached \<...>
A negative answer that resulted from a no data error (NODATA) should be cached \<...>
It seems the responses for FQDN requests are containing SOA records making those valid cacheable negative answers. As the same RFC's section 8 says:
Negative caching in resolvers is no-longer optional, if a resolver caches anything it must also cache negative answers.
Thank you for following up on this, @EugeneOne1.
Indeed, this behavior makes complete sense in terms of the RFC specification.
And yes, as I am using a subdomain of a real domain (registered and hosted in Cloudflare), the upstream SOA is being returned when the UDM Pro doesn't have a local host for the query.
I guess not using the FQDN for local hosts mitigates the issue for me, however I don't think that is ideal.
I'm reading the RFC and thinking this through, but not sure if there is any viable alternative here...
Looking back at this again, I had some ideas of how one could fix this:
IMHO, any of these solution should in theory mitigate this problem... would this be something acceptable for AdGuard Home?
@pedrolamas, hello and sorry for a late response. As per RFC 2308 Section 5:
As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL.
A negative answer that resulted from a name error (NXDOMAIN) should be cached <...>
A negative answer that resulted from a no data error (NODATA) should be cached <...>
It seems the responses for FQDN requests are containing SOA records making those valid cacheable negative answers. As the same RFC's section 8 says:
Negative caching in resolvers is no-longer optional, if a resolver caches anything it must also cache negative answers.
SERVFAIL is cached for me:
Prerequisites
Please answer the following questions for yourself before submitting an issue. YOU MAY DELETE THE PREREQUISITES SECTION.
Issue Details
`./AdGuardHome -v --version` output:
```txt AdGuard Home Version: v0.107.5 Channel: release Go version: go1.16.15 Build time: 2022-03-04T13:00:44Z+0000 GOOS: linux GOARCH: arm64 Race: false Dependencies: github.com/AdguardTeam/dnsproxy@v0.40.7-0.20220207171519-b3947de6a902 (sum: h1:6pxvSWL9tVelFo0R3t6Pn8u6YU5dCqTVehvNnP6lOqI=) github.com/AdguardTeam/golibs@v0.10.4 (sum: h1:TMBkablZC0IZOpRgg9fzAKlxxNhSN2YJq7qbgtuZ7PQ=) github.com/AdguardTeam/urlfilter@v0.15.2 (sum: h1:LZGgrm4l4Ys9eAqB+UUmZfiC6vHlDlYFhx0WXqo6LtQ=) github.com/NYTimes/gziphandler@v1.1.1 (sum: h1:ZUDjpQae29j0ryrS0u/B8HZfJBtBQHjqw2rQ2cqUQ3I=) github.com/aead/chacha20@v0.0.0-20180709150244-8b13a72661da (sum: h1:KjTM2ks9d14ZYCvmHS9iAKVt9AyzRSqNU1qabPih5BY=) github.com/aead/poly1305@v0.0.0-20180717145839-3fee0db0b635 (sum: h1:52m0LGchQBBVqJRyYYufQuIbVqRawmubW3OFGqK1ekw=) github.com/ameshkov/dnscrypt/v2@v2.2.3 (sum: h1:X9UP5AHtwp46Ji+sGFfF/1Is6OPI/SjxLqhKpx0P5UI=) github.com/ameshkov/dnsstamps@v1.0.3 (sum: h1:Srzik+J9mivH1alRACTbys2xOxs0lRH9qnTA7Y1OYVo=) github.com/beefsack/go-rate@v0.0.0-20200827232406-6cde80facd47 (sum: h1:M57m0xQqZIhx7CEJgeLSvRFKEK1RjzRuIXiA3HfYU7g=) github.com/cheekybits/genny@v1.0.0 (sum: h1:uGGa4nei+j20rOSeDeP5Of12XVm7TGUd4dJA9RDitfE=) github.com/digineo/go-ipset/v2@v2.2.1 (sum: h1:k6skY+0fMqeUjjeWO/m5OuWPSZUAn7AucHMnQ1MX77g=) github.com/fsnotify/fsnotify@v1.5.1 (sum: h1:mZcQUHVQUQWoPXXtuf9yuEXKudkV2sx1E06UadKWpgI=) github.com/go-ping/ping@v0.0.0-20210506233800-ff8be3320020 (sum: h1:mdi6AbCEoKCA1xKCmp7UtRB5fvGFlP92PvlhxgdvXEw=) github.com/google/go-cmp@v0.5.5 (sum: h1:Khx7svrCpmxxtHBq5j2mp/xVjsi8hQMfNLvJFAlrGgU=) github.com/google/gopacket@v1.1.19 (sum: h1:ves8RnFZPGiFnTS0uPQStjwru6uO6h+nlr9j6fL7kF8=) github.com/google/renameio@v1.0.1 (sum: h1:Lh/jXZmvZxb0BBeSY5VKEfidcbcbenKjZFzM/q0fSeU=) github.com/insomniacslk/dhcp@v0.0.0-20210310193751-cfd4d47082c2 (sum: h1:NpTIlXznCStsY88jU+Gh1Dy5dt/jYV4z4uU8h2TUOt4=) github.com/josharian/native@v0.0.0-20200817173448-b6b71def0850 (sum: h1:uhL5Gw7BINiiPAo24A2sxkcDI0Jt/sqp1v5xQCniEFA=) github.com/kardianos/service@v1.2.0 (sum: h1:bGuZ/epo3vrt8IPC7mnKQolqFeYJb7Cs8Rk4PSOBB/g=) github.com/lucas-clemente/quic-go@v0.24.0 (sum: h1:ToR7SIIEdrgOhgVTHvPgdVRJfgVy+N0wQAagH7L4d5g=) github.com/marten-seemann/qtls-go1-16@v0.1.4 (sum: h1:xbHbOGGhrenVtII6Co8akhLEdrawwB2iHl5yhJRpnco=) github.com/mdlayher/ethernet@v0.0.0-20190606142754-0394541c37b7 (sum: h1:lez6TS6aAau+8wXUP3G9I3TGlmPFEq2CTxBaRqY6AGE=) github.com/mdlayher/netlink@v1.4.0 (sum: h1:n3ARR+Fm0dDv37dj5wSWZXDKcy+U0zwcXS3zKMnSiT0=) github.com/mdlayher/raw@v0.0.0-20210412142147-51b895745faf (sum: h1:InctQoB89TIkmgIFQeIL4KXNvWc1iebQXdZggqPSwL8=) github.com/miekg/dns@v1.1.45 (sum: h1:g5fRIhm9nx7g8osrAvgb16QJfmyMsyOCb+J7LSv+Qzk=) github.com/patrickmn/go-cache@v2.1.0+incompatible (sum: h1:HRMgzkcYKYpi3C8ajMPV8OFXaaRUnok+kx1WdO15EQc=) github.com/pkg/errors@v0.9.1 (sum: h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=) github.com/satori/go.uuid@v1.2.0 (sum: h1:0uYX9dsZ2yD7q2RtLRtPSdGDWzjeM3TbMJP9utgA0ww=) github.com/ti-mo/netfilter@v0.4.0 (sum: h1:rTN1nBYULDmMfDeBHZpKuNKX/bWEXQUhe02a/10orzg=) github.com/u-root/u-root@v7.0.0+incompatible (sum: h1:u+KSS04pSxJGI5E7WE4Bs9+Zd75QjFv+REkjy/aoAc8=) go.etcd.io/bbolt@v1.3.6 (sum: h1:/ecaJf0sk1l4l6V4awd65v2C3ILy7MSj+s/x1ADCIMU=) golang.org/x/crypto@v0.0.0-20211215153901-e495a2d5b3d3 (sum: h1:0es+/5331RGQPcXlMfP+WrnIIS6dNnNRe0WB02W0F4M=) golang.org/x/net@v0.0.0-20211216030914-fe4d6282115f (sum: h1:hEYJvxw1lSnWIl8X9ofsYMklzaDs90JI2az5YMd4fPM=) golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c (sum: h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ=) golang.org/x/sys@v0.0.0-20211216021012-1d35b9e2eb4e (sum: h1:fLOSk5Q00efkSvAm+4xcoXD+RRmLmmulPn5I3Y9F2EM=) golang.org/x/text@v0.3.7 (sum: h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=) gopkg.in/natefinch/lumberjack.v2@v2.0.0 (sum: h1:1Lc07Kr7qY4U2YPouBjpCLxpiyxIVoxqXgkXLknAOE8=) gopkg.in/yaml.v2@v2.4.0 (sum: h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=) howett.net/plist@v0.0.0-20201203080718-1454fab16a06 (sum: h1:QDxUo/w2COstK1wIBYpzQlHX/NqaQTcf9jyz347nI58=) ```Expected Behavior
When I ask AdGuard Home (ADH) for a local host, it will correctly forward that to the Upstream DNS Servers (my UDM Pro in this case)
If the host does not exist, it returns that information and caches that the host does not exist with a low TTL.
Next request should use the cached value, until entry expires.
Actual Behavior
TTL for a failed resolution seems to be 3600s!
Screenshots
Screenshot:
![image](https://user-images.githubusercontent.com/85504/157087689-91222d82-e0c8-41e5-9c96-51a01f5b2a5f.png) - on 1 you see the initial request for the non-existing local host - on 2 and above, all logged entries show "served from cache" - on 3 (more than 1h later!) I see a new entry with no indication of cache - on 4 and after, all logged entries show "served from cache"Discussed in https://github.com/AdguardTeam/AdGuardHome/discussions/4369