Closed lollipopman closed 7 years ago
@ssgelm @mvallaly @dkuntz2 would love any feedback on this request
You might want to try a number of regional DC domains, for each cloud provider... Especially since aws.amazon.com
routes to AWS's us-east-1
DC, which has their worst availability, you might generate more noise than desired.
I do see that it reports a health score proportionate to the number of hosts it can reach, which should provide a better signal than an all-or-nothing check.
@zdzolton I agree that a better geographically dispersed sample would be better
@dpirotte made the observation that timeout in ruby 1.9 is known to have broken corner cases. There are a variety of workarounds, http://stackoverflow.com/a/21014439/1236063, however the existing code has proven successful in litmus paper's use case as show by our many years of use in the tcp dependency. So rather than blinding incorporating a stackoverflow patch, I would leave as is and switch to the the tcp socket timeout available in ruby versions 2.0 and greater, when we deprecate support for ruby 1.9.
This metric provides a simple of heuristic of how healthy an ISP connection appears to be. You provide it a list of host and ports:
LitmusPaper::Metric::InternetHealth.new([ "cloud.google.com:443", "azure.microsoft.com:443", "aws.amazon.com:443", ])
And the check performs a TCP connect to each host and port. The metric then returns a number between 0 and 100 indicating the percentage of host which are reachable.