Closed Frommi closed 2 years ago
I have a Raspberry Pi I can use for testing on ARM. I believe ARM is generally little-endian, so I don't think that will break there (although 64-bit operations on 32-bit (including i686) architectures are often not optimal.) We will need some alternative code for big-endian platforms though. Of the ones Rust has support that seems to be ppc/ppc64 and mips/mips64, not sure how to best test those.
https://github.com/japaric/trust might be useful as well.
I did some testing on my RPi3 (in 32-bit mode). Tests seem to work fine. The decompression speed is quite slow though. Compression is more matching what we've seen on i686, e.g a bit slower but reasonably close. What's really interesting is the fast compression mode. It's WAY faster than miniz!
EDIT: ~Looks like there is a bug in compression somewhere. It's failing on some files (at least on x86-64).~ EDIT2: fixed, didn't have much impact on results anyway.
running 15 tests
test compress_default ... bench: 87,607,969 ns/iter (+/- 1,040,324)
test compress_fast ... bench: 14,947,149 ns/iter (+/- 57,430)
test compress_high ... bench: 119,577,463 ns/iter (+/- 285,547)
test compress_mem_to_heap_default_miniz ... bench: 78,975,207 ns/iter (+/- 278,287)
test compress_mem_to_heap_default_oxide ... bench: 88,464,222 ns/iter (+/- 225,252)
test compress_mem_to_heap_fast_miniz ... bench: 22,736,103 ns/iter (+/- 181,652)
test compress_mem_to_heap_fast_oxide ... bench: 15,026,363 ns/iter (+/- 55,414)
test compress_mem_to_heap_high_miniz ... bench: 107,817,340 ns/iter (+/- 269,151)
test compress_mem_to_heap_high_oxide ... bench: 119,555,131 ns/iter (+/- 348,488)
test create_compressor ... bench: 178,687 ns/iter (+/- 2,230)
test decompress ... bench: 7,041,536 ns/iter (+/- 13,197)
test decompress_mem_to_heap_miniz ... bench: 3,254,413 ns/iter (+/- 7,979)
test decompress_mem_to_heap_oxide ... bench: 7,074,666 ns/iter (+/- 32,858)
test zlib_compress_fast ... bench: 15,247,534 ns/iter (+/- 27,623)
test zlib_decompress ... bench: 7,360,597 ns/iter (+/- 6,202)
Got the decompression times down a bit by avoiding copy_from_slice on matches on ARM:
running 4 tests
test decompress ... bench: 6,082,626 ns/iter (+/- 25,375)
test decompress_mem_to_heap_miniz ... bench: 3,285,197 ns/iter (+/- 23,074)
test decompress_mem_to_heap_oxide ... bench: 6,107,855 ns/iter (+/- 13,411)
test zlib_decompress ... bench: 6,396,010 ns/iter (+/- 14,134)
Maybe it could be sped up further with unaligned loads/stores. It seems the match copies are the main thing that is slower than miniz on arm, a file with forced raw blocks is slightly faster with oxide, difference is way smaller on files that are not very compressible.
There is a place that probably will break on not x86 arch. As well as performance issues with read/write_unaligned.