Frommi / miniz_oxide

Rust replacement for miniz
MIT License
174 stars 49 forks source link

ARM testing #9

Closed Frommi closed 2 years ago

Frommi commented 7 years ago

There is a place that probably will break on not x86 arch. As well as performance issues with read/write_unaligned.

oyvindln commented 7 years ago

I have a Raspberry Pi I can use for testing on ARM. I believe ARM is generally little-endian, so I don't think that will break there (although 64-bit operations on 32-bit (including i686) architectures are often not optimal.) We will need some alternative code for big-endian platforms though. Of the ones Rust has support that seems to be ppc/ppc64 and mips/mips64, not sure how to best test those.

matklad commented 7 years ago

https://github.com/japaric/trust might be useful as well.

oyvindln commented 7 years ago

I did some testing on my RPi3 (in 32-bit mode). Tests seem to work fine. The decompression speed is quite slow though. Compression is more matching what we've seen on i686, e.g a bit slower but reasonably close. What's really interesting is the fast compression mode. It's WAY faster than miniz!

EDIT: ~Looks like there is a bug in compression somewhere. It's failing on some files (at least on x86-64).~ EDIT2: fixed, didn't have much impact on results anyway.


running 15 tests
test compress_default                   ... bench:  87,607,969 ns/iter (+/- 1,040,324)
test compress_fast                      ... bench:  14,947,149 ns/iter (+/- 57,430)
test compress_high                      ... bench: 119,577,463 ns/iter (+/- 285,547)
test compress_mem_to_heap_default_miniz ... bench:  78,975,207 ns/iter (+/- 278,287)
test compress_mem_to_heap_default_oxide ... bench:  88,464,222 ns/iter (+/- 225,252)
test compress_mem_to_heap_fast_miniz    ... bench:  22,736,103 ns/iter (+/- 181,652)
test compress_mem_to_heap_fast_oxide    ... bench:  15,026,363 ns/iter (+/- 55,414)
test compress_mem_to_heap_high_miniz    ... bench: 107,817,340 ns/iter (+/- 269,151)
test compress_mem_to_heap_high_oxide    ... bench: 119,555,131 ns/iter (+/- 348,488)
test create_compressor                  ... bench:     178,687 ns/iter (+/- 2,230)
test decompress                         ... bench:   7,041,536 ns/iter (+/- 13,197)
test decompress_mem_to_heap_miniz       ... bench:   3,254,413 ns/iter (+/- 7,979)
test decompress_mem_to_heap_oxide       ... bench:   7,074,666 ns/iter (+/- 32,858)
test zlib_compress_fast                 ... bench:  15,247,534 ns/iter (+/- 27,623)
test zlib_decompress                    ... bench:   7,360,597 ns/iter (+/- 6,202)
oyvindln commented 7 years ago

Got the decompression times down a bit by avoiding copy_from_slice on matches on ARM:


running 4 tests
test decompress                         ... bench:   6,082,626 ns/iter (+/- 25,375)
test decompress_mem_to_heap_miniz       ... bench:   3,285,197 ns/iter (+/- 23,074)
test decompress_mem_to_heap_oxide       ... bench:   6,107,855 ns/iter (+/- 13,411)
test zlib_decompress                    ... bench:   6,396,010 ns/iter (+/- 14,134)

Maybe it could be sped up further with unaligned loads/stores. It seems the match copies are the main thing that is slower than miniz on arm, a file with forced raw blocks is slightly faster with oxide, difference is way smaller on files that are not very compressible.