flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

Fix reader buffer length bug #126

Closed cpcloud closed 4 years ago

cpcloud commented 4 years ago

This PR fixes a bug where decompression would attempt to read n bytes, where n is the length of the reader's buffer. If n is greater than the number of bytes in a block then garbage will be decompressed and fail with a cryptic error message.

This wasn't reproducible 100% of the time, likely due to memory allocation not being reproducible across invocations of a program. That is, sometimes the buffer's length will be equal to the number of bytes in a block, other times it will be larger. I therefore did not add a test, as it would've been flakey.

If desired, I can spend some more time trying to come up with a test.

This PR is meant to address the bug until a better solution can be introduced. Part of the issue is the Codec methods taking a &mut Vec<u8>. Since those methods are allocating anyway, it might be better to return the new Vec instead of mutating the reference.

cpcloud commented 4 years ago

Here are the benchmarks of fix-reader-buf-len-bug vs master:

group                                base                                   fix-reader-buf-len-bug
-----                                ----                                   ----------------------
big record                           1.00    895.6±8.57ns        ? B/sec    1.11    992.5±4.23ns        ? B/sec
big schema, read 1 record            1.06     51.4±0.26µs        ? B/sec    1.00     48.6±0.23µs        ? B/sec
big schema, read 100 records         1.00    146.3±1.05µs        ? B/sec    1.12    163.4±1.05µs        ? B/sec
big schema, read 100k records        1.09    108.3±0.48ms        ? B/sec    1.00     99.7±0.82ms        ? B/sec
big schema, read 10k JSON records    1.00     18.8±0.07ms        ? B/sec    1.25     23.5±0.05ms        ? B/sec
big schema, read 10k records         1.01     10.0±0.03ms        ? B/sec    1.00      9.8±0.03ms        ? B/sec
big schema, write 1 record           1.02      4.1±0.02µs        ? B/sec    1.00      4.0±0.01µs        ? B/sec
big schema, write 100 records        1.00     23.6±0.08µs        ? B/sec    1.11     26.2±0.15µs        ? B/sec
big schema, write 10k records        1.10      2.4±0.01ms        ? B/sec    1.00      2.2±0.03ms        ? B/sec
quickstop null file                  1.01      2.6±0.00ms        ? B/sec    1.00      2.6±0.01ms        ? B/sec
small record                         1.00    123.7±0.83ns        ? B/sec    1.00    123.8±1.25ns        ? B/sec
small schema, read 1 record          1.00      7.9±0.01µs        ? B/sec    1.00      7.9±0.03µs        ? B/sec
small schema, read 100 records       1.00     21.8±0.15µs        ? B/sec    1.11     24.2±0.18µs        ? B/sec
small schema, read 10k records       1.00   1401.2±6.82µs        ? B/sec    1.09   1526.1±7.70µs        ? B/sec
small schema, write 1 record         1.12  1593.9±10.91ns        ? B/sec    1.00   1427.9±9.09ns        ? B/sec
small schema, write 100 records      1.00      4.8±0.12µs        ? B/sec    1.04      4.9±0.03µs        ? B/sec
small schema, write 10k records      1.00    275.3±0.59µs        ? B/sec    1.05    289.5±0.20µs        ? B/sec
cpcloud commented 4 years ago

We only care about the read path for this PR, so here are the benchmarks filtered to reads only:

group                                base                                   fix-reader-buf-len-bug
-----                                ----                                   ----------------------
big schema, read 1 record            1.06     51.4±0.26µs        ? B/sec    1.00     48.6±0.23µs        ? B/sec
big schema, read 100 records         1.00    146.3±1.05µs        ? B/sec    1.12    163.4±1.05µs        ? B/sec
big schema, read 100k records        1.09    108.3±0.48ms        ? B/sec    1.00     99.7±0.82ms        ? B/sec
big schema, read 10k JSON records    1.00     18.8±0.07ms        ? B/sec    1.25     23.5±0.05ms        ? B/sec
big schema, read 10k records         1.01     10.0±0.03ms        ? B/sec    1.00      9.8±0.03ms        ? B/sec
small schema, read 1 record          1.00      7.9±0.01µs        ? B/sec    1.00      7.9±0.03µs        ? B/sec
small schema, read 100 records       1.00     21.8±0.15µs        ? B/sec    1.11     24.2±0.18µs        ? B/sec
small schema, read 10k records       1.00   1401.2±6.82µs        ? B/sec    1.09   1526.1±7.70µs        ? B/sec
poros commented 4 years ago

Thanks for fixing this bug; it looks pretty nasty!