Closed jwilder closed 9 years ago
Confirmed. I'll take a look at this this evening.
Thanks @dgryski.
Looks like for this data, the number of leading zeros is 33, but only 5 bits are available to store this length, hence it gets encoded wrong.
2015/10/07 20:48:10 vDelta=0000000000000000000000000000000001000000000000000000000000000000
2015/10/07 20:48:10 leading=33 trailing=30
So, bumping the number of bits used to encode the leading zeroes from 5 to 6 fixes this bug. This is probably the correct fix.
What about clamping the leading bits to avoid overflowing? Something like:
diff --git a/tsz.go b/tsz.go
index c22bfc7..cd8eb1e 100644
--- a/tsz.go
+++ b/tsz.go
@@ -113,6 +113,10 @@ func (s *Series) Push(t uint32, v float64) {
s.leading, s.trailing = leading, trailing
s.bw.WriteBit(bitstream.One)
+ // Make sure we don't overflow our 5 bits
+ if leading > 31 {
+ leading = 31
+ }
Hmm.. I like the clamping idea. Need to figure out the right place to put it though.
I think it needs to be immediately after we calculate leading
.
FWIW, increasing the number of bits adds only a single byte to the 2h worth of minutely data in TwoHoursData
, and no extra bytes on 2h of second-level production metrics (not included in the repo).
The paper does say
...due to the extra 13 bits of overhead required to encode the length of leading zero bits and meaningful bits.
, but 5+6 is only 11 bits and the extra two bits aren't the control bits, because those would be included in the calculations for the other sizes too. (But even if it's 6 bits, then it's only 12 bits of overhead...
I think I'll go with your clamping solution.
Force-pushed the clamping fix.
:+1:
While investigating https://github.com/influxdb/influxdb/issues/4357, it looks like the root issue is that the float encoding is not decoding the original values correctly.
This test reproduces the issue: