Closed arthurprs closed 3 years ago
Nice work! What I'm currently working on, and is probably the cause of the merge conflicts, is supporting slices as parameters instead of vecs (https://github.com/PSeitz/lz4_flex/issues/11). This is done to better support the frame format and also apply some performance optimizations. There are some implications which are not so easy to handle in the unsafe decompression, I'm currently looking into. It's probably better to use the new API.
Currently it accepts a Sink
which is a u8 slice with pos. Not completely sure if this slice and pos should be wrapped or accepted as separate parameters in the API.
EDIT: nevermind this comment, I misunderstood your comment. I should've checked the latest master code first.
~Honestly, after writing this changeset I don't think supporting a Vec for output internally is worth the effort. The API level is unaffected as we can just create a properly sized Vec internally and use the backing slice internally. I agree this may have some slight performance implications in the unchecked decompressor but that's extremely unsafe.~
~But I really don't agree that a Vec helps in the frame format. The frame format enforces a strict max block size, which can be used to determine the max output size for the compressor and the the size of the output slice for the decompressor.~
I fixed the conflicts with master. I think the compressor is in a good state, if you want to start a closer review. I'll move onto the safe decompressor when time allows.
I can reproduce some regression with LTO enabled, but much smaller ones :thinking: I suspect there are differences in the benchmarks so I added a lz4_flex_master variant to the benchmarks (ed6e060) so they can be run alongside.
W/O LTO faster (see LTO bellow)
Compress/lz4_flexx_rust/725
time: [1.2502 us 1.2516 us 1.2529 us]
thrpt: [551.83 MiB/s 552.44 MiB/s 553.05 MiB/s]
change:
time: [-1.7243% -1.5582% -1.3910%] (p = 0.00 < 0.05)
thrpt: [+1.4106% +1.5829% +1.7545%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
Compress/lz4_flexx_rust_master/725
time: [1.7235 us 1.7285 us 1.7336 us]
thrpt: [398.83 MiB/s 400.01 MiB/s 401.18 MiB/s]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
Compress/lz4_flexx_rust/34308
time: [92.714 us 93.004 us 93.467 us]
thrpt: [350.06 MiB/s 351.80 MiB/s 352.90 MiB/s]
change:
time: [-2.1264% -1.2183% +0.0655%] (p = 0.01 < 0.05)
thrpt: [-0.0654% +1.2333% +2.1726%]
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
Compress/lz4_flexx_rust_master/34308
time: [101.93 us 102.23 us 102.68 us]
thrpt: [318.65 MiB/s 320.04 MiB/s 320.98 MiB/s]
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
W/ WTO 10% regression for 725 bytes 4% regression for 32KB The massive improvement from W/O LTO to W/ LTO for master suggests it was missing some inline annotations ( I added a few in this PR) and/or it's optimizing better (because of the smaller size).
Compress/lz4_flexx_rust/725
time: [1.2312 us 1.2455 us 1.2695 us]
thrpt: [544.66 MiB/s 555.15 MiB/s 561.58 MiB/s]
change:
time: [-1.7319% -1.1970% -0.5335%] (p = 0.00 < 0.05)
thrpt: [+0.5363% +1.2115% +1.7624%]
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
Compress/lz4_flexx_rust_master/725
time: [1.1240 us 1.1265 us 1.1298 us]
thrpt: [611.99 MiB/s 613.75 MiB/s 615.13 MiB/s]
change:
time: [-35.620% -35.148% -34.771%] (p = 0.00 < 0.05)
thrpt: [+53.307% +54.197% +55.327%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
Compress/lz4_flexx_rust/34308
time: [90.973 us 91.036 us 91.102 us]
thrpt: [359.14 MiB/s 359.40 MiB/s 359.65 MiB/s]
change:
time: [-3.4910% -2.4520% -1.7169%] (p = 0.00 < 0.05)
thrpt: [+1.7469% +2.5136% +3.6173%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
Compress/lz4_flexx_rust_master/34308
time: [87.705 us 87.840 us 87.979 us]
thrpt: [371.89 MiB/s 372.48 MiB/s 373.05 MiB/s]
change:
time: [-16.416% -15.292% -14.409%] (p = 0.00 < 0.05)
thrpt: [+16.834% +18.052% +19.640%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
Ah sorry I changed Cargo.toml on master to use unsafe by default from the last dev session, the comparison is invalid
Looks really good so far, nice job!
Would be good to add some benchmarks with the frame format. It would also good to enable COMPRESSION66K
, this is a JSON data example, which has often different performance characteristics.
I was able to reduce all sources of overhead using cost-generics and the non-dict version is now somewhat faster. There's overhead using an ext-dict but that's sort of unavoidable.
Nice usage of const generics!
I added some comments directly on the commits, I hope that's fine
Just out of curiosity, how would I use the frame decompression? Could you please give an example? Thank you 😄
@billyb2 here are the examples, it's likely that similar functions will be added to the crate.
fn compress_frame(input: &[u8]) -> Vec<u8> {
let mut enc = FrameEncoder::new(Vec::new());
enc.write_all(input).unwrap();
enc.finish().unwrap()
}
fn decompress_frame(input: &[u8]) -> Result<Vec<u8>, lz4_flex::frame::Error> {
let mut de = FrameDecoder::new(input);
let mut out = Vec::new();
de.read_to_end(&mut out)?;
Ok(out)
}
I think the coverage setup is not working as expected. Specifically, I don't think it's merging the results of the multiple runs (w/ different features).
I think the coverage setup is not working as expected. Specifically, I don't think it's merging the results of the multiple runs (w/ different features).
Yes it doesn't correctly aggregate runs with different feature flags, you can ignore it
@PSeitz Thank you for you patience. I think it's ready for review. I added as much testing as I could so I'm cautiously optimistic it's working.
@arthurprs thank you, really impressive job there. reviewing all of it may take a little.
Looks really good so far.
Did I understand correctly, that in the linked mode, the ext_dict feature is used to give access to the previous bytes?
Did I understand correctly, that in the linked mode, the ext_dict feature is used to give access to the previous bytes?
Exactly. When data has to go to the beginning of the buffer again we can use ext_dict to reference data that is still part of the compressor window but is not continuous memory-wise.
Thank you for your patience, really exceptional quality. One thing that's missing I think are tests for the features exposed via FrameInfo
, content_size, block_size and the two checksums. Currently only defaults are tested?
Ahh, yes :thinking: Only block mode is tested in depth. I'll try to add a few tests.
Related to this. Can you enable the github actions for this PR? It says
1 workflow awaiting approval
First-time contributors need a maintainer to approve running workflows.
Related to this. Can you enable the github actions for this PR? It says
1 workflow awaiting approval First-time contributors need a maintainer to approve running workflows.
It's quite annoying, I constantly need to click on approve and run, until one PR is merged.
After a contributor has at least one pull request merged into a project's repository, any future pull requests from that contributor's fork will automatically run workflows.
Thank you for nudging me about the tests, there were some horrendous bugs :facepalm: :facepalm: :facepalm: :facepalm: :facepalm:
Thank you for nudging me about the tests, there were some horrendous bugs facepalm facepalm facepalm facepalm facepalm
anything that is not covered by tests doesn't work or will be broken eventually :)
Time to merge finally :D Really great job!
Thank you :tada: Looking forwarding to using this very soon.
released with 0.8.0
Hello there :wave: I managed to get a mostly complete and efficient implementation for the frame format on top of your work. It'd be great if we could eventually reach a production level LZ4 implementation in Rust :tada:
Please forgive the size of the PR, it's a bit larger than it should be. I may have sneaked a few improvements in this PR. While I was grokking the codebase I fixed/changed a few bits to be clearer/idiomatic.
It'd be great if you could take an initial look :smile:
TODO:
write_integer
implementationenum CompressionError { OutputTooSmall }
fuzz_decomp_corrupted_data