Open carlos-n opened 3 years ago
Would you mind providing your full configuration? Otherwise we need to guess which settings you are running with and it wastes everyone's time if we guess wrong.
A full trace (logs when running the recursor with --trace
) of the initial and further requests would be very helpful as well, if possible, or a targeted trace using rec_control trace-regex 'recursion-test\.lab\.test\.net\.$
before sending the queries otherwise.
I attach recursor configuration and forward-zones file for server with version 4.3.7 and server with 4.4.2. I also attach traces for both servers. Thanks so much.
recursor-config_v4-4-2.txt traces_v4-3-7.txt traces_v4-4-2.txt forward-zones-v4-3-7.txt forward-zones-v4-4-2.txt recursor-config_v4-3-7.txt
Thanks! I think what is happening is caused by a change introduced in https://github.com/PowerDNS/pdns/pull/9351. We used to disable qname minimization for all forwards and after this PR it seems that we only disable for recursive forwards, and in your case the first name is for a non-recursive forward so since 4.4.0 QM stays enabled. We should still look at the cache even with QM enabled, so I believe this is a bug.
I'm guessing you do not care about qname minimization since you forward everything to 8.8.8.8 anyway, so perhaps you could disable QM as a work-around by setting: qname-minimization=no
. If you do that, please report back so we can narrow the issue down :)
I've done some tests with qname-minimization=no
and it seems to work as it should.
I will test some more and let you know if everything is OK.
Thanks again !!!!
Hi @carlos-n How did you tests come out?
I'm curious as I use a similar setup, so without actually testing this I suppose it concerns me as well.
Hi everyone Sorry for the silence. I've been disconnected from this subject for a while. The behaviour of the recursor improved after disabling QM and we are using this configuration in production, but the behaviour of 4.4.x is still different from branch 4.3.x. It is not as noticeable as before disabling QM, but i still find some weird scenarios. Previous to branch 4.4.x, the rules in "forward-zones" had the top priority when routing requests despite of anything that could have been cached by the recursor. But in versions 4.4.x, i'm finding cases in which a cached NS record prevails over "forward-zones" rules, and this is a game change of unpredictable and usually bad consequences for all the ones (like me) who relied on "forward-zones" as the master routing decision table for our architecture. Should "forward-zones" still prevail over anything else in branches 4.4.x or later ???? Or is it an expected behaviour that sometimes they don't ???? Thanks in advance !!!!
Hi,
Let me try to explain.
Yes, the behaviour with respect to forward-zones
(but not forward-zones-recurse
) has changed. Since 4.4.x, NS records learned from hosts forwarded to are used to resolve names in subdomains. See https://docs.powerdns.com/recursor/settings.html#forward-zones for an explanation why (this explanation was added quite recently). I'd say the old behaviour was buggy, or at least not very useful.
This does have consequences: you can only forward to servers that are authoritative for the domain and NS records coming from these servers should point to proper authoritative servers for subdomains of the forwarded domain. If that is not the case, things might break.
forward-zones-recurse
is different: in this case the target only needs to be able to resolve (all) names in the forwarded domain and no NS complications occur.
There still might be bugs of course. So if you still think you hit a bug after this explanation, please show us traces so we can investigate.
Thanks for the info. I'm going to confirm if the cases i've been detecting match this policy. I'm afraid they do.
H3llo,
If i May ask a few Questions.
2.We used Posgresql so much but at ONCE we setup up a replica of Posgrewsql on another server it dont Crash but it destroyers the powerdns everytime. What DB is the BEST and MOST stable to use with replica or master - slave DB? I read a whole lot of positive about those who use Sqlite. And some even made the new from Redis 6 work Even better, now it s redis 7. It has Even a ovn Redis Stack noow wih incredible GUI. It can do multiple things at the same time, being 3* faster for powerdns Than Sqlite, mysql and MariaDB, and do cache for web. I sak een some god CoackroachDB is inctredible and works with pens. Just wondering since it is a huge difference on what they support and most of all trafic.
I have since we have been a DigiCert Partner for over 20 years free DigiCert Business Plus Wildcard, with PKI, and moniorignng and skanning for virus, intrusions and so on, But they dent have a PowerDns plugin to Control it. So if anyone know if Netstat, Dynatrace,
- mai 2022 kl. 20:24 skrev Otto Moerbeek @.***>:
Hi,
Let me try to explain.
Yes, the behaviour with respect to forward-zones (but not forward-zones-recurse) has changed. Since 4.4.x, NS records learned from hosts forwarded to are used to resolve names in subzones. See https://docs.powerdns.com/recursor/settings.html#forward-zones https://docs.powerdns.com/recursor/settings.html#forward-zones for an explanation why (this explanation was added quite recently). I'd say the old behaviour was buggy, or at least not very useful.
This does have consequences: you can only forward to servers that are authoritative for the domain and NS records coming from these servers should point to proper authoritative servers for subdomains of the forwarded domain. If that is not the case, things might break.
forward-zones-recurse is different: in this case the target only needs to be able to resolve (all) names in the forwarded domain and no NS complications occur.
There still might be bugs of course. So if you still think you hit a bug after this explanation, please show us traces so we can investigate.
— Reply to this email directly, view it on GitHub https://github.com/PowerDNS/pdns/issues/10533#issuecomment-1121431902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVFAEYVFG5A36GQDVF3QPT3VJFJ7JANCNFSM47NZ6QZQ. You are receiving this because you are subscribed to this thread.
forward-zones-recurse
is different: in this case the target only needs to be able to resolve (all) names in the forwarded domain and no NS complications occur.
The thing to note here is you can (probably always?) still use this to get the same behaviour as before even when talking to auths. Unless you have something that actually cares if it got rd=1
or not.
Short description
Some valid cache entries are not being used causing unnecessary requests to external resolvers.
Environment
Steps to reproduce
The following explanation describes not our exact use case for confientiality reasons, but describes an example that ilustrates perfectly the issue.
We have forward-zones-recurse=.=8.8.8.8 configured in our recursor. We use this configuration because we have limited access to internet from our recursor and thus we wouldn't be able to do the recursion ourselves.
We have an instance of PDNS Auth server with BIND backend where we have a zone called "lab.test.net" in which we have defined a CNAME record like this
recursion-test 10800 CNAME api-global.netflix.com.
In forward-zones file we include a rule that instructs the recursor to go to our PDNS Auth instance in order to resolve any name belonging to "lab.test.net".
We try to resolve "recursion-test.lab.test.net" against our recursor instance. We obtain the following result (i have masked the actual IPs in the answer with XX.XX.XX.XX)
The first resolution from "recursion-test.lab.test.net" to "api-global.netflix.com" is locally provided by our PDNS Auth instance. The rest of the answer in provided by "8.8.8.8".
Expected behaviour
In further requests to resolve "recursion-test.lab.test.net" we expect our recursor to use the cached responses without going to "8.8.8.8" as far as the lower TTL lasts. This scenario is working this way until recursor version 4.3.7.
Actual behaviour
In further requests to resolve "recursion-test.lab.test.net" the recursor is not sending any request to our PDNS Auth instance as it has a valid cache entry with a positive TTL for "recursion-test.lab.test.net", but is always sending a request to "8.8.8.8" for "api-global.netflix.com" despite of the fact of having valid cache entries with positive TTLs for this name.
Somehow our recursor is not considering usable the cache entries of the names it has resolved against "8.8.8.8" in this particular scenario of nested resolutions. A curious fact about these unnecessary requests to "8.8.8.8" is that they are made with recursion desired flag set to "0" despite of the fact of having "forward-zones-recurse" activated.
If we try to resolve "api-global.netflix.com" directly against our recursor, the behaviour is as expected and recursor is able to use the chached entries in further requests until TTL expiration.
Other information
I have compared the cache entries in the correct case (version 4.3.7) and the wrong one (versions 4.4.2 and 4.5.2) and are identical. I'm afraid a change of behaviour regarding this kind of scenario has been introduced in branch 4.4.X and inherited in branch 4.5.x.