PSeitz / lz4_flex

Fastest pure Rust implementation of LZ4 compression/decompression.
MIT License
441 stars 28 forks source link

improve unsafe Decompression 0-8% #113

Closed PSeitz closed 1 year ago

PSeitz commented 1 year ago

improve unsafe Decompression by 0-8% by replacing memcpy calls with a custom function

BlockDecompress/lz4_flex_rust/725
                        time:   [213.25 ns 213.87 ns 214.67 ns]
                        thrpt:  [3.1454 GiB/s 3.1571 GiB/s 3.1663 GiB/s]
                 change:
                        time:   [-2.6612% -2.2451% -1.7947%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8275% +2.2967% +2.7340%]
                        Performance has improved.
BlockDecompress/lz4_flex_rust/34308
                        time:   [15.369 µs 15.397 µs 15.422 µs]
                        thrpt:  [2.0719 GiB/s 2.0753 GiB/s 2.0790 GiB/s]
                 change:
                        time:   [-1.2694% -0.9057% -0.5299%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5327% +0.9140% +1.2857%]
                        Change within noise threshold.
BlockDecompress/lz4_flex_rust/64723
                        time:   [27.474 µs 27.525 µs 27.577 µs]
                        thrpt:  [2.1858 GiB/s 2.1900 GiB/s 2.1940 GiB/s]
                 change:
                        time:   [-2.7110% -2.4352% -2.1634%] (p = 0.00 < 0.05)
                        thrpt:  [+2.2113% +2.4960% +2.7865%]
                        Performance has improved.
BlockDecompress/lz4_flex_rust/66675
                        time:   [10.224 µs 10.287 µs 10.367 µs]
                        thrpt:  [5.9897 GiB/s 6.0361 GiB/s 6.0738 GiB/s]
                 change:
                        time:   [-10.535% -10.268% -9.9730%] (p = 0.00 < 0.05)
                        thrpt:  [+11.078% +11.443% +11.775%]
                        Performance has improved.
BlockDecompress/lz4_cpp/66675
                        time:   [11.542 µs 11.563 µs 11.590 µs]
                        thrpt:  [5.3577 GiB/s 5.3702 GiB/s 5.3800 GiB/s]
                 change:
                        time:   [-1.9330% -1.3642% -0.8404%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8475% +1.3830% +1.9711%]
                        Change within noise threshold.
BlockDecompress/lz4_flex_rust/9991663
                        time:   [3.6115 ms 3.6234 ms 3.6362 ms]
                        thrpt:  [2.5591 GiB/s 2.5681 GiB/s 2.5766 GiB/s]
                 change:
                        time:   [-0.5029% -0.1070% +0.3170%] (p = 0.62 > 0.05)
                        thrpt:  [-0.3160% +0.1071% +0.5054%]
                        No change in performance detected.
BlockDecompress/lz4_flex_rust/96274
                        time:   [3.2652 µs 3.2939 µs 3.3293 µs]
                        thrpt:  [26.931 GiB/s 27.221 GiB/s 27.460 GiB/s]
                 change:
                        time:   [+28.650% +30.025% +31.445%] (p = 0.00 < 0.05)
                        thrpt:  [-23.922% -23.092% -22.270%]
                        Performance has regressed.
codecov[bot] commented 1 year ago

Codecov Report

Merging #113 (7362373) into main (4d36f98) will increase coverage by 0.54%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
+ Coverage   89.20%   89.75%   +0.54%     
==========================================
  Files          12       13       +1     
  Lines        2326     2449     +123     
==========================================
+ Hits         2075     2198     +123     
  Misses        251      251              
Impacted Files Coverage Δ
src/lib.rs 100.00% <ø> (ø)
src/block/decompress.rs 95.87% <100.00%> (+0.02%) :arrow_up:
src/fastcpy_unsafe.rs 100.00% <100.00%> (ø)