bmizerany / perks

Effective Computation of Things
BSD 2-Clause "Simplified" License
187 stars 54 forks source link

quantile streams seem to degrade after too many resets #8

Open leifwalsh opened 10 years ago

leifwalsh commented 10 years ago

I have a long-running application which has two streams (set to report 50%, 90%, and 99% quantiles) that both receive the same input data (latencies as float64 milliseconds from database operations). One stream gets reset each second after reporting a few quantiles, the other one reports at the same time but never gets reset. I've noticed a few times that after long enough, the stream that gets reset periodically starts reporting all 0s almost all the time, except when it looks like there were only one or two inputs in the window between resets, in which case all quantiles report the same number, presumably because all the input in that time was above 50%.

I'm not sure if this is a bug but I would like to start by getting advice on how to gather useful diagnostics from the data structure.

bmizerany commented 10 years ago

That sounds like a bug. Can you tell me a little more about your data stream? What are you measuring and what is its distribution? Also, can you estimate how many Inserts until you see the situation you're describing?

bmizerany commented 10 years ago

If you could write a test that reproduces this problem, I would be very grateful. If not, I can do it.

leifwalsh commented 10 years ago

I am measuring database operation latencies in milliseconds. Values are about 10 and up, mostly normal around about 50ms but with a long, long tail going up to around 30000ms in some cases. It takes hours or days, and there are probably (remembering a while back here) 10-50 samples per second. I'm using a stream initialized with 50%, 90%, and 99% quantiles. 

I can try to build a reproducer eventually but it'll be a while before I can find the time. Thanks!

Cheers, Leif

On Mon, Apr 28, 2014 at 11:44 PM, Blake Mizerany notifications@github.com wrote:

If you could write a test that reproduces this problem, I would be very grateful. If not, I can do it.

Reply to this email directly or view it on GitHub: https://github.com/bmizerany/perks/issues/8#issuecomment-41639221

bmizerany commented 10 years ago

@leifwalsh I just re-read your issue. Will you please explain what you mean by "presumably because all the input in that time was above 50%."? It sounds like what you're saying is that 100% is above 50%, which makes no sense. I don't think this is what you mean, I just need some clarification.

leifwalsh commented 10 years ago

You're right, that doesn't make any sense. I'm not sure what I meant by that.

The problem is that almost all reports are zeroes, and then every once in a while I get some large positive numbers and all three quantiles are the same. I was trying to guess at the root cause of that behavior and made a nonsensical guess. :)

leifwalsh commented 10 years ago

Also, #10 looks like a plausible reproducer, thanks to @aybabtme for that!