bmizerany / perks

Effective Computation of Things
BSD 2-Clause "Simplified" License
186 stars 53 forks source link

Abnormally high samples size #6

Closed daneharrigan closed 10 years ago

daneharrigan commented 10 years ago

I don't have code to reproduce the issue yet. It's proving to be difficult. I'm offering as much as I know up front while I work on reproducing the issue.

I have a web service building percs based on the data posted to it. I'm calculating p50, p95 and p99. Every second I hand off the samples from quantile.Stream#Samples(). Once a minute I reset the stream, quantile.Stream#Reset().

On average the slice from Samples() contains 50-500 items, but in some scenarios the slice contains upwards to 20K items. This gist is an example of the slice containing 15K https://gist.github.com/daneharrigan/7016164.

I'm wondering if there could be a race condition of inserting and resetting, or resetting and compressing. I'll post code soon.

daneharrigan commented 10 years ago

The following code creates an endless sample size. The code only inserts numbers from 1 to 200, but the sample sizes ticks away at 999, 1999, 2999, etc.

package main

import (
    "github.com/bmizerany/perks/quantile"
    "fmt"
    "math/rand"
    "time"
)

var s *quantile.Stream

func main() {
    s = quantile.NewTargeted(0.50, 0.95, 0,99)
    go insert()
    for _ = range time.Tick(time.Second) {
        fmt.Printf("Samples Size: %d\n", len(s.Samples()))
    }
}

func insert() {
    for _ = range time.Tick(time.Millisecond) {
        n := rand.Intn(200)
        s.Insert(float64(n))
    }
}
cespare commented 10 years ago

@daneharrigan hehe, you have a typo.

0.50, 0.95, 0,99 // See the comma?

Took me a while to figure out what was going on here.

I think that NewTargeted should panic if you pass in numbers not in [0, 1).

daneharrigan commented 10 years ago

@cespare hm, I didn't have the typo in production, but had the same kind of behavior. I'll reopen or make a new issue when I have more data