dns-violations / dnsflagday

DNS flag day
https://dnsflagday.net/
147 stars 40 forks source link

All tests fail for a domain when one of its nameservers is just offline #28

Closed stephanbosch closed 5 years ago

stephanbosch commented 5 years ago

Tests fail with a timeout or connection refusal (tcp). I'd expect this to show some orange message indicating that the offline server could not be tested, rather than reporting that the domain is violating the standards.

pspacek commented 5 years ago

Hmm, at very best it could be reported as "slow" (rating 3/4 where 4 is "dead"), but I do not think it is worth the effort.

Habbie commented 5 years ago

I agree with @pspacek - but we might entertain pull requests enhancing the measurement.

vcunat commented 5 years ago

When different IPs have different properties, I could imagine showing like 25% "STOP" + 75% "GO", but in any case there's the link to technical details... EDIT: well, it's a problem that these in-between states aren't easily explainable (to laymen at least), so perhaps it would really be better to just report "SLOW" if <=50% are bad.

fleish commented 5 years ago

FWIW, I'm also trying to sort out a number of seemingly false positives being reported by the tool.

One appears to be a result of using LVS to host a DNS VIP on more than 1 public IP using the same backend hosts. I suspect the tool is sending queries in parallel, resulting in some requests getting lost in translation. The result is the tool reports 1 of the 2 public IP addresses as timing out. The public IP it reports as timing out varies and changes in between tests so I know each public IP itself is OK. Once I updated the zone's NS records to only include 1 such public IP address per LVS VIP the tool began returning "OK" reliably 100% of the time.

Let me know if you'd prefer I open a new issue for this

oerdnj commented 5 years ago

Why don’t you rather fix your server connectivity?

If the queries/responses are being lost the server is susceptible to spoofing attacks.

fleish commented 5 years ago

I may have spoken too soon. I'm still looking into the reason(s) why I am getting intermittent failures with the tool, which is difficult to say with authority since I don't have access to the source network where the tests are being run from. I also tried running genreport myself, but it just hangs when I run it.

Habbie commented 5 years ago

genreport needs a domain name on stdin.

fleish commented 5 years ago

Ah, that makes sense but wasn't listed in the command help so I missed it. But even after providing that it still times out. Strace shows it stops in the same spot:

futex(0x7f187de3cba4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=720, ...}) = 0
read(4, "# This file is managed by man:sy"..., 4096) = 720
read(4, "", 4096)                       = 0
close(4)                                = 0
getpid()                                = 53660
select(4, [0 3], [], NULL, NULL
Habbie commented 5 years ago

Well, what's on fd 4?

fleish commented 5 years ago

Scratch that, I misread you said it wants the domain name on stdin ... but I had added it as an additional argument to the command. (facepalm)

pspacek commented 5 years ago

The whole point it so make servers responsive and eliminate timeouts which are hard to deal with so it is pointless to ignore some of the timeouts.