dgryski / go-tsz

Time series compression algorithm from Facebook's Gorilla paper
BSD 2-Clause "Simplified" License
541 stars 66 forks source link

Encoding similar values fails to decode properly #4

Closed jwilder closed 9 years ago

jwilder commented 9 years ago

While investigating https://github.com/influxdb/influxdb/issues/4357, it looks like the root issue is that the float encoding is not decoding the original values correctly.

This test reproduces the issue:

func TestEncodeSimilarFloats(t *testing.T) {
    tunix := uint32(time.Unix(0, 0).Unix())
    s := New(tunix)
    want := []struct {
        t uint32
        v float64
    }{
        {tunix, 6.00065e+06},
        {tunix + 1, 6.000656e+06},
        {tunix + 2, 6.000657e+06},
        {tunix + 3, 6.000659e+06},
        {tunix + 4, 6.000661e+06},
    }

    for _, v := range want {
        s.Push(v.t, v.v)
    }

    s.Finish()

    it := s.Iter()

    for _, w := range want {
        if !it.Next() {
            t.Fatalf("Next()=false, want true")
        }
        tt, vv := it.Values()
        if w.t != tt || w.v != vv {
            t.Errorf("Values()=(%v,%v), want (%v,%v)\n", tt, vv, w.t, w.v)
        }
    }

    if it.Next() {
        t.Fatalf("Next()=true, want false")
    }

    if err := it.Err(); err != nil {
        t.Errorf("it.Err()=%v, want nil", err)
    }
}
influxdb:go-tsz jason$ go test ./...
--- FAIL: TestEncodeSimilarFloats (0.00s)
    tsz_test.go:112: Values()=(2,3.337975699873597e-302), want (2,6.000657e+06)
    tsz_test.go:112: Values()=(3,-3.337975699873597e-302), want (3,6.000659e+06)
    tsz_test.go:112: Values()=(4,-3.337979037484385e-302), want (4,6.000661e+06)
FAIL
FAIL    github.com/dgryski/go-tsz   0.006s
dgryski commented 9 years ago

Confirmed. I'll take a look at this this evening.

jwilder commented 9 years ago

Thanks @dgryski.

dgryski commented 9 years ago

Looks like for this data, the number of leading zeros is 33, but only 5 bits are available to store this length, hence it gets encoded wrong.

2015/10/07 20:48:10 vDelta=0000000000000000000000000000000001000000000000000000000000000000
2015/10/07 20:48:10 leading=33 trailing=30
dgryski commented 9 years ago

So, bumping the number of bits used to encode the leading zeroes from 5 to 6 fixes this bug. This is probably the correct fix.

jwilder commented 9 years ago

What about clamping the leading bits to avoid overflowing? Something like:

diff --git a/tsz.go b/tsz.go
index c22bfc7..cd8eb1e 100644
--- a/tsz.go
+++ b/tsz.go
@@ -113,6 +113,10 @@ func (s *Series) Push(t uint32, v float64) {
                        s.leading, s.trailing = leading, trailing

                        s.bw.WriteBit(bitstream.One)
+                       // Make sure we don't overflow our 5 bits
+                       if leading > 31 {
+                               leading = 31
+                       }
dgryski commented 9 years ago

Hmm.. I like the clamping idea. Need to figure out the right place to put it though.

dgryski commented 9 years ago

I think it needs to be immediately after we calculate leading.

FWIW, increasing the number of bits adds only a single byte to the 2h worth of minutely data in TwoHoursData, and no extra bytes on 2h of second-level production metrics (not included in the repo).

dgryski commented 9 years ago

The paper does say ...due to the extra 13 bits of overhead required to encode the length of leading zero bits and meaningful bits., but 5+6 is only 11 bits and the extra two bits aren't the control bits, because those would be included in the calculations for the other sizes too. (But even if it's 6 bits, then it's only 12 bits of overhead... )

I think I'll go with your clamping solution.

dgryski commented 9 years ago

Force-pushed the clamping fix.

jwilder commented 9 years ago

:+1: