dgryski / go-tsz

Time series compression algorithm from Facebook's Gorilla paper
BSD 2-Clause "Simplified" License
538 stars 67 forks source link

Improvements maybe #10

Closed Dieterbe closed 8 years ago

Dieterbe commented 8 years ago

@dgryski can you give these a critical look and let me know whether they look good or are stupid, and if so, why. thanks :)

dgryski commented 8 years ago

I'll run some benchmarks today and see if they change

dgryski commented 8 years ago

Ran 30 interations of my benchmarks and stuffed them into https://godoc.org/rsc.io/benchstat

Looks like no significant changes. If anything, Encode got a teeny tiny bit slower.

<dgryski@kamek[go-tsz] \ʕ◔ϖ◔ʔ/ > benchstat bench.old bench.new 
name      old time/op  new time/op  delta
Encode-4  19.1µs ± 2%  19.4µs ± 3%  +1.64%  (p=0.000 n=29+30)
Decode-4  7.26µs ± 2%  7.28µs ± 2%    ~     (p=0.534 n=29+30)
dgryski commented 8 years ago

The test data set is two hours of secondly data from a production system.

dgryski commented 8 years ago

Oops, that was the regular two hours of minutely data. Reran again with the two hours of secondly data. Still no change:

<dgryski@kamek[go-tsz] \ʕ◔ϖ◔ʔ/ > benchstat bench.old bench.new 
name       old time/op  new time/op  delta
EncodeB-4  1.31ms ± 2%  1.32ms ± 4%  +1.33%  (p=0.000 n=29+30)
DecodeB-4   595µs ± 2%   599µs ± 4%    ~     (p=0.130 n=30+30)
Encode-4   19.3µs ± 2%  19.4µs ± 3%    ~     (p=0.120 n=28+30)
Decode-4   7.28µs ± 2%  7.31µs ± 2%    ~     (p=0.085 n=27+29)
Dieterbe commented 8 years ago

Ok but benchmarks aside, what do you think of the commits, logically speaking? Especially the first and third make sense to me, I'm not sure about the second. @dgryski

dgryski commented 8 years ago

The first commit makes sense. The second one I agree there are too many variables, and as was shown by the benchmarks it didn't make a difference. The third one I think makes for slightly less clear code. I like the fact that we return EOF when len(b.stream)==0. We've totally consumed b.stream and the slice is now empty. Personal preference, I guess. (Much like I dislike unless in perl and try to always write numerical comparisons with < instead of a mixture of < and >, etc...)

dgryski commented 8 years ago

You can merge https://github.com/dgryski/go-tsz/commit/c52a434821dd04a69468fcb9a49501c8927be001 if you want.

Dieterbe commented 8 years ago

fair enough, cherry-picked the first one.