Closed cpcloud closed 4 years ago
Here are the benchmarks of fix-reader-buf-len-bug
vs master
:
group base fix-reader-buf-len-bug
----- ---- ----------------------
big record 1.00 895.6±8.57ns ? B/sec 1.11 992.5±4.23ns ? B/sec
big schema, read 1 record 1.06 51.4±0.26µs ? B/sec 1.00 48.6±0.23µs ? B/sec
big schema, read 100 records 1.00 146.3±1.05µs ? B/sec 1.12 163.4±1.05µs ? B/sec
big schema, read 100k records 1.09 108.3±0.48ms ? B/sec 1.00 99.7±0.82ms ? B/sec
big schema, read 10k JSON records 1.00 18.8±0.07ms ? B/sec 1.25 23.5±0.05ms ? B/sec
big schema, read 10k records 1.01 10.0±0.03ms ? B/sec 1.00 9.8±0.03ms ? B/sec
big schema, write 1 record 1.02 4.1±0.02µs ? B/sec 1.00 4.0±0.01µs ? B/sec
big schema, write 100 records 1.00 23.6±0.08µs ? B/sec 1.11 26.2±0.15µs ? B/sec
big schema, write 10k records 1.10 2.4±0.01ms ? B/sec 1.00 2.2±0.03ms ? B/sec
quickstop null file 1.01 2.6±0.00ms ? B/sec 1.00 2.6±0.01ms ? B/sec
small record 1.00 123.7±0.83ns ? B/sec 1.00 123.8±1.25ns ? B/sec
small schema, read 1 record 1.00 7.9±0.01µs ? B/sec 1.00 7.9±0.03µs ? B/sec
small schema, read 100 records 1.00 21.8±0.15µs ? B/sec 1.11 24.2±0.18µs ? B/sec
small schema, read 10k records 1.00 1401.2±6.82µs ? B/sec 1.09 1526.1±7.70µs ? B/sec
small schema, write 1 record 1.12 1593.9±10.91ns ? B/sec 1.00 1427.9±9.09ns ? B/sec
small schema, write 100 records 1.00 4.8±0.12µs ? B/sec 1.04 4.9±0.03µs ? B/sec
small schema, write 10k records 1.00 275.3±0.59µs ? B/sec 1.05 289.5±0.20µs ? B/sec
We only care about the read path for this PR, so here are the benchmarks filtered to reads only:
group base fix-reader-buf-len-bug
----- ---- ----------------------
big schema, read 1 record 1.06 51.4±0.26µs ? B/sec 1.00 48.6±0.23µs ? B/sec
big schema, read 100 records 1.00 146.3±1.05µs ? B/sec 1.12 163.4±1.05µs ? B/sec
big schema, read 100k records 1.09 108.3±0.48ms ? B/sec 1.00 99.7±0.82ms ? B/sec
big schema, read 10k JSON records 1.00 18.8±0.07ms ? B/sec 1.25 23.5±0.05ms ? B/sec
big schema, read 10k records 1.01 10.0±0.03ms ? B/sec 1.00 9.8±0.03ms ? B/sec
small schema, read 1 record 1.00 7.9±0.01µs ? B/sec 1.00 7.9±0.03µs ? B/sec
small schema, read 100 records 1.00 21.8±0.15µs ? B/sec 1.11 24.2±0.18µs ? B/sec
small schema, read 10k records 1.00 1401.2±6.82µs ? B/sec 1.09 1526.1±7.70µs ? B/sec
Thanks for fixing this bug; it looks pretty nasty!
This PR fixes a bug where decompression would attempt to read
n
bytes, wheren
is the length of the reader's buffer. Ifn
is greater than the number of bytes in a block then garbage will be decompressed and fail with a cryptic error message.This wasn't reproducible 100% of the time, likely due to memory allocation not being reproducible across invocations of a program. That is, sometimes the buffer's length will be equal to the number of bytes in a block, other times it will be larger. I therefore did not add a test, as it would've been flakey.
If desired, I can spend some more time trying to come up with a test.
This PR is meant to address the bug until a better solution can be introduced. Part of the issue is the
Codec
methods taking a&mut Vec<u8>
. Since those methods are allocating anyway, it might be better to return the newVec
instead of mutating the reference.