bkaradzic / go-lz4

Port of LZ4 lossless compression algorithm to Go
BSD 2-Clause "Simplified" License
211 stars 23 forks source link

Use capacity to determine whether dst can be reused for encoding and decoding #20

Open manishrjain opened 7 years ago

manishrjain commented 7 years ago

Also, modify the benchmarks to reuse the allocated slice. This improves the ns/op for both encoding and decoding by 0.4s/op (2.4s -> 2.0s for Decode, and similar for encode).

manishrjain commented 7 years ago

Hi, is this repository being maintained actively? I have this PR and seeing more performance issues which need to be fixed.

dgryski commented 7 years ago

I'm not sure using cap here actually makes sense from an API point of view, although I agree it does from a performance perspective. The space between len() and cap() exists in this weird ownership space. As a caller, I might pass a sub-slice to a function saying "This is what you're allowed to use; if you need more, get your own." If the function checks cap(), we can't do. this.

For reference, note that the Go snappy port checks only len(): https://github.com/golang/snappy/blob/master/decode.go#L60

calmh commented 7 years ago

Besides, this is really only a thing here due to how the benchmarks are written. A caller can get the same behavior by doing dst, err = Encode(dst[:cap(dst)], testfile) anyway, no?

manishrjain commented 7 years ago

@dgryski : I see your point. My perspective it this:

A caller of this API would only need to pass a dst slice, if they want to avoid memory allocations; which in Go can be very expensive (considering GC etc.). The typical usage then would be: dst = Decompress(dst, src). When using len, dst's size would change all the time, causing Decompress to allocate slices way more frequently; which was not the intent of the caller.

I can't think of a scenario where you'd pass something in dst, not want the function to exceed the length of it and yet be OK when it returns you a whole new slice.

In fact, when I first realized what it was actually doing, it was a surprise, because I allocated a big enough slice, and just expected the function to keep reusing it.

That's my opinion. Though, it might not be idiosyncratic Go; and I defer to you to make that decision. Happy to change the way we call the API. But, at the very least, it would be good to update the comments to specify what @calmh has suggested.

manishrjain commented 7 years ago

Let me know if I should retract this PR.