Closed mrdavidlaing closed 11 years ago
Sounds reasonable, nicely spotted! Would it be possible to include the maximum too in this chart, which is the most obvious indicator as per Gil Tene’s explanation?
Same graph, but including max. (note that the line colors have changed)
Observe how as median latency goes up, the number of measurements goes down. Evidence of co-ordinated omission?
Not necessarily. Number of measurements can go down due to increased count of exceptions. (I'm afraid currently count of exceptions is not reported due to a bug) It's a common pattern that count of exceptions increases aside with increased latency when server problems happen.
Currently data are polled once a minute, and to affect our data significantly, max latency of a single request should be of a similar length. Very small count of requests takes longer than 10 second. I've found only 9 values higher than 30 second in our live latency data up to day. Thus, I don't think this effect currently affects our data significantly.
However, I think we can have another problem. With polling period of 1 minute, many short periods of latency degradation can just slip through our net.
Full description of the issue at: http://labs.cityindex.com/labs-team/2013/03/13/the-coordinated-omission-problem/
This encompasses: