Closed abligh closed 4 years ago
Rebased onto fix for #74
Hi Alex,
This (and the other patches) is great, thank you! I'm a bit backlogged and need to spend my hobby time on some other stuff in the immediate future, but I'm looking forward to reviewing and integrating this!
Ask
Ask,
Great. Happy to tweak them if need be.
Alex
Rebased onto #77, #71, #72
Rebased
Alright, I’ve been thinking about this for a while. I’m really happy about this work, clearly it’s something GeoDNS needs to be more generally useful.
However — I don’t think the health checking itself should be inside the DNS daemon. We can’t make health checks for every protocol, and in some deployments it doesn’t make sense to have every DNS server do health checks (or the reverse).
It’d be better to just have a file format for “health check results” that GeoDNS knows how to load and then a separate daemon to do the checks; then others could do something to get health check results from nagios/pingdom/whatever they use as appropriate.
Ask,
Having thought about it, I disagree with your last paragraph:
tcp
covers loads. Specifically http
, https
, smtp
(plus secure variants), as well as ntp
. If we recognise for a minute that covering dns
would be silly (you need anycast for that), we've covered a very large percentage of traffic on the internet, and much more would be covered by the tcp
tests. Can you think of any major use-cases which aren't covered?WDYT?
I definitely see your argument, but I think your message also implies some agreement with mine (you could quickly think of a couple more tests). I’m more concerned about the slippery slope of GeoDNS suddenly having more code and complexity for doing health checks than for … DNS stuff. Even your SSL example could quickly get requests for accepting invalid certificates, specifying a client cert, etc etc.
(As an aside I actually do use GeoDNS for DNS servers (“g.ntpns.org”) — I know that’s completely crazy and I don’t think GeoDNS should support this natively.)
I like the idea of having the external mechanism/API; and maybe I can be convinced that keeping a couple of simple checks in, too, is reasonable for ease of use for simple use cases, however I really think the external mechanism will be cleaner and easier to maintain for configurations past the simplest ones. Maybe just the TCP one? Really I think it’d be much better to extract the code you wrote into a separate daemon though.
Anyway, for the external health data:
As a starting point maybe have the label have something like
“test”: { “label”: “foo” }
or even just
“healthlabel”: “foo”
and then the health check file(s?) could be just something like
foo 1.2.3.4
to specify that 1.2.3.4 is an invalid answer.
I have reworked this to support
tcp
/ ntp
)exec
health test - run a command with an appropriate parameterfile
health test - read a JSON file (as described by you)nodeping
health test - use Nodeping APIpingdom
health test - use Pingdom APIRebased
I am new to GeoDNS, so I couldn't follow up the discussion above. But, I'd like to bring up a use case for your consideration:
I want to use Measurement API from Ripe Atlas Project to dynamically change geodns records/weights. Here is one example, https://atlas.ripe.net/measurements/2400706/#!seismograph it pings Linode Tokyo datacenter every 5 minutes from 5 cities in China. I'd like GeoDNS to dynamically change multiple A records' weights for a given subdomain, based on packet loss rates.
@skyred if you want to add/remove A records according to packet loss, you can use the test framework above. If you want to change their weights (rather than just add/remove them), I can't think of a better way right now than rewriting the zone files. Write them to a .tmp file, then rename them over the original to get an atomic change and GeoDNS will pick them up. Allowing a healthcheck to specify a weight is an interesting idea though.
@abh quick ping - any movement on this one?
rebased
Rebased onto a branch including cc09d9d to avoid travis errors.
This is an experimental feature to add/remove DNS entries depending on whether servers/services are up/down. Initially NTP testing and TCP port testing are supported.
This is an experimental lightly tested patch. It's based on top of the other 3 merge requests I have submitted recently, but is pretty easily separated.
From the commit message: