PSeitz / lz4_flex

Fastest pure Rust implementation of LZ4 compression/decompression.
MIT License
441 stars 28 forks source link

faster duplicate_overlapping #69

Closed PSeitz closed 1 year ago

PSeitz commented 1 year ago

improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version. Now we copy 4 bytes, instead of one in every iteration. Without that the compiler will unroll/auto-vectorize the copy with a lot of branches. This is not what we want, as large overlapping copies are not that common.