PTR lookup returns NXDomain if client retries while upstream hasn't answered yet

TheB1gG commented 9 months ago

Prerequisites

[X] I have checked the Wiki and Discussions and found no answer
[X] I have searched other issues and found no duplicates
[X] I want to report a bug and not ask a question or ask for help
[X] I have set up AdGuard Home correctly and configured clients to use it. (Use the Discussions for help with installing and configuring clients.)

Platform (OS and CPU architecture)

Linux, AMD64 (aka x86_64)

Installation

Snapcraft

Setup

Local AdGuardHome -> Remote AdGuardHome -> Ubiquity USG-3

AdGuard Home version

v0.107.43

Action

dig -x 192.168.2.86

Expected result

dig -x 192.168.2.86

; <<>> DiG 9.19.19-1-Debian <<>> -x 192.168.2.86
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60835
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;86.2.168.192.in-addr.arpa.     IN      PTR

;; ANSWER SECTION:
86.2.168.192.in-addr.arpa. 0    IN      PTR     Family-Room.main.internal.

Actual result

dig -x 192.168.2.86

; <<>> DiG 9.19.19-1-Debian <<>> -x 192.168.2.86
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62851
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;86.2.168.192.in-addr.arpa.     IN      PTR

;; AUTHORITY SECTION:
86.2.168.192.in-addr.arpa. 10   IN      SOA     fake-for-negative-caching.adguard.com. hostmaster.86.2.168.192.in-addr.arpa. 100500 1800 900 604800 86400

Additional information and/or screenshots

See https://github.com/AdguardTeam/AdGuardHome/issues/6691#issuecomment-1959709780

Logfile: https://gist.github.com/TheB1gG/a1df1e733ab3cacfb37ac61140fbe1b3

TheB1gG commented 9 months ago

Can't you reproduce it or why didn't get this issue any labels?

ainar-g commented 9 months ago

Sorry for the late response.

2024/01/29 22:13:21.444936 3180371#48 [debug] dnsforward: recursion detected resolving "86.2.168.192.in-addr.arpa."

This line shows you what could be wrong. It seems like your configuration of AdGuard Home is causing it to query PTRs from itself. To prevent this, inspect your configuration and set the upstreams for PTRs, including those from locally-served networks, explicitly.

TheB1gG commented 9 months ago

~~Thank you for the response @ainar-g You pointed me in the right direction. I tested with debian on wsl2 and that fires every dns query 3 times for some unknown reason to me. So maybe you could relax your recursion detection a littlebit or convince debian or microsoft to only query ones while running on wsl2?~~

TheB1gG commented 9 months ago

Hi @ainar-g I simply couldn't find a way to get it working. PTR in Windows tracert does not work too because of this. The behaiviour of tracert is like this or with IPv6 Did you ever test your recursion implementation with reverse DNS servers that have a latency of over 50 ms and realworld applications like tracert or traceroute? Because your recursion detection triggers there before the responsible remote dns server for that IP range can answer. Is there a option to disable the recursion stuff in the meantime?

Since I don't know if replies to closed issues are read I will open a new Issue in a week with the updated information if I don't get any reply here.

Thank you

ainar-g commented 9 months ago

@TheB1gG, I'm sorry, but I am not sure what you're asking about here. If the network configuration on the machine allows sending PTR queries to itself, we do not consider it a valid configuration, since it'd just end up in infinite loops of queries. There isn't an option to disable this check, and you should inspect and fix the configuration that allows this to occur in the first place. And, unless I'm mistaken, latency shouldn't have anything to do with this, as the recursion detection logic is only based on the message ID, type, and target.

If you need help figuring out how to prevent this, you can ask around in the Discussions.

TheB1gG commented 9 months ago

@ainar-g the configuration does not allow to send querys to itself, as you can see in the screenshot in https://github.com/AdguardTeam/AdGuardHome/issues/6691#issuecomment-1950492874 the client does send query 1 and then retries before query 1 was answered and now query 2 gets directly answered with nxdomain because the recursion detections triggers and then shortly after the answer from upstream is there and query 1 gets answered but the client will ignore it because of the nxdomain from before. If the latency is less than the retry timeout (which seems to be 40 ms for microsoft) everything works fine. It fails only if the retry happens while adguardhome still waits for the response of the upstream. Just to make sure you understand me correctly, at no time does adguard query itself. If that can't be read from my log that I attached earlier, please tell me what part of the logs or config you need so you can understand that there is no recursion.

ainar-g commented 9 months ago

The only other thing that could cause this is if the software in question is reusing message IDs, because, as mentioned, the logic is based on message ID, resource type, and the target. And that can potentially cause all sorts of issues with all sort of DNS servers, so if you have access to the software, I'd recommend using a randomized ID.

TheB1gG commented 8 months ago

it looks like tracert does reuse the message id. Wireshark does even warn about it (wireshark did run on the querying client) Software wise we speak about the normal tracert programm that ships with every windows. I don't think that the solution can be to replace standard windows tools for this. If Adguardhome wouldn't answer at all on the query it thinks that is a recursion it would work, but that nxdomain does confuse the tools like traceroute and mtr.

TheB1gG commented 8 months ago

@ainar-g could you please reopen this and give it the label bug. After reading most RFCs regarding DNS it is completely RFC conform for Programms to retry with the same message I'd. The problem is that adguardhomes handling of it is not RFC conform. Thank you

AdguardTeam / AdGuardHome