bmizerany / perks

Effective Computation of Things
BSD 2-Clause "Simplified" License
186 stars 53 forks source link

Results are inaccurate until 500th sample inserted. #1

Closed wathiede closed 11 years ago

wathiede commented 11 years ago

Using the slightly modified code from 'Example (Simple)' (see below), I get the following output:

-- snip about 496 * 4 lines--
count: 0
perc50: 0.3079969873662769
perc90: 1.5462974403775809
perc99: 0.6431003006884586
count: 0
perc50: 0.3079969873662769
perc90: 0.04130272743506464
perc99: 0.01068636700896286
count: 0
perc50: 0.7176288428497417
perc90: 2.6582266087185467
perc99: 6.504333667303405
count: 500
perc50: 0.7176288428497417
perc90: 2.6582266087185467
perc99: 6.504333667303405
count: 501
perc50: 0.7176288428497417
perc90: 2.6582266087185467
perc99: 6.504333667303405
count: 502
perc50: 0.7176288428497417
perc90: 2.6582266087185467
perc99: 6.504333667303405
...

It appears that q.Count() et. al. don't return valid data until the first flush and compress. Not sure if this is obvious to someone who better understands the Biased Quantiles paper, and it is a problem that fixes itself with more samples, but it was a bit surprising to me while experimenting with the library.

package main

import (
  "fmt"
  "math/rand"

  "github.com/bmizerany/perks/quantile"
)

func readFloats(ch chan float64) {
  for i := 0; i < 5000; i++ {
    ch <- rand.ExpFloat64()
  }
  close(ch)
}

func main() {
  ch := make(chan float64)
  go readFloats(ch)

  // Compute the 50th, 90th, and 99th percentile for a stream within the
  // set error epsilon of 0.01.
  q := quantile.NewTargeted(0.01, 0.50, 0.90, 0.99)
  for v := range ch {
    q.Insert(v)

    fmt.Println("perc50:", q.Query(0.50))
    fmt.Println("perc90:", q.Query(0.90))
    fmt.Println("perc99:", q.Query(0.99))
    fmt.Println("count:", q.Count())
  }   
}
bmizerany commented 11 years ago

Wow. Great find. I'm sorry I didn't notice it. Will fix.

bmizerany commented 11 years ago

It was an off by one error, of course.

wathiede commented 11 years ago

Hmm, I think there is a new problem now. Running the modified example in my original comment yields this:

$  go run ex.go 
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/bmizerany/perks/quantile.(*Stream).Query(0x4c2000580e0, 0x3fe0000000000000, 0x3fe2cb25a0ca16c6)
        /-snip-/Go/src/github.com/bmizerany/perks/quantile/stream.go:112 +0x83
main.main()
        /-snip-/Go/src/q/ex.go:27 +0x14a

goroutine 3 [chan send]:
main.readFloats(0x4c200069000)
        /-snip-/Go/src/q/ex.go:12 +0x4e
created by main.main
        /-snip/Go/src/q/ex.go:19 +0x58

With line 27:

fmt.Println("perc50:", q.Query(0.50))

Which is the first Query call after the the first insert.

bmizerany commented 11 years ago

Yeah. On it.

On Tue, Apr 16, 2013 at 1:03 PM, wathiede notifications@github.com wrote:

Hmm, I think there is a new problem now. Running the modified example in my original comment yields this:

$ go run ex.go panic: runtime error: index out of range

goroutine 1 [running]:github.com/bmizerany/perks/quantile.(*Stream).Query(0x4c2000580e0, 0x3fe0000000000000, 0x3fe2cb25a0ca16c6) /-snip-/Go/src/github.com/bmizerany/perks/quantile/stream.go:112 +0x83 main.main() /-snip-/Go/src/q/ex.go:27 +0x14a

goroutine 3 [chan send]: main.readFloats(0x4c200069000) /-snip-/Go/src/q/ex.go:12 +0x4e created by main.main /-snip/Go/src/q/ex.go:19 +0x58

With line 27:

fmt.Println("perc50:", q.Query(0.50))

Which is the first Query call after the the first insert.

— Reply to this email directly or view it on GitHubhttps://github.com/bmizerany/perks/issues/1#issuecomment-16468316 .

bmizerany commented 11 years ago

@wathiede This should do it. Thank you for the submission and example.

Also, I fixed the examples when I noticed your example giving 0.01 as the first parameter to NewTargeted, which was an old, pre-OSS API.

wathiede commented 11 years ago

Yay, thanks!

bmizerany commented 11 years ago

Np. Sorry for such an embarrassing bug. :/

On Tue, Apr 16, 2013 at 7:09 PM, wathiede notifications@github.com wrote:

Yay, thanks!

— Reply to this email directly or view it on GitHubhttps://github.com/bmizerany/perks/issues/1#issuecomment-16483157 .