Closed stephanbosch closed 5 years ago
Hmm, at very best it could be reported as "slow" (rating 3/4 where 4 is "dead"), but I do not think it is worth the effort.
I agree with @pspacek - but we might entertain pull requests enhancing the measurement.
When different IPs have different properties, I could imagine showing like 25% "STOP" + 75% "GO", but in any case there's the link to technical details... EDIT: well, it's a problem that these in-between states aren't easily explainable (to laymen at least), so perhaps it would really be better to just report "SLOW" if <=50% are bad.
FWIW, I'm also trying to sort out a number of seemingly false positives being reported by the tool.
One appears to be a result of using LVS to host a DNS VIP on more than 1 public IP using the same backend hosts. I suspect the tool is sending queries in parallel, resulting in some requests getting lost in translation. The result is the tool reports 1 of the 2 public IP addresses as timing out. The public IP it reports as timing out varies and changes in between tests so I know each public IP itself is OK. Once I updated the zone's NS records to only include 1 such public IP address per LVS VIP the tool began returning "OK" reliably 100% of the time.
Let me know if you'd prefer I open a new issue for this
Why don’t you rather fix your server connectivity?
If the queries/responses are being lost the server is susceptible to spoofing attacks.
I may have spoken too soon. I'm still looking into the reason(s) why I am getting intermittent failures with the tool, which is difficult to say with authority since I don't have access to the source network where the tests are being run from. I also tried running genreport myself, but it just hangs when I run it.
genreport needs a domain name on stdin.
Ah, that makes sense but wasn't listed in the command help so I missed it. But even after providing that it still times out. Strace shows it stops in the same spot:
futex(0x7f187de3cba4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=720, ...}) = 0
read(4, "# This file is managed by man:sy"..., 4096) = 720
read(4, "", 4096) = 0
close(4) = 0
getpid() = 53660
select(4, [0 3], [], NULL, NULL
Well, what's on fd 4?
Scratch that, I misread you said it wants the domain name on stdin ... but I had added it as an additional argument to the command. (facepalm)
The whole point it so make servers responsive and eliminate timeouts which are hard to deal with so it is pointless to ignore some of the timeouts.
Tests fail with a timeout or connection refusal (tcp). I'd expect this to show some orange message indicating that the offline server could not be tested, rather than reporting that the domain is violating the standards.