Frommi / miniz_oxide

Rust replacement for miniz
MIT License
168 stars 48 forks source link

Switch from adler32 to adler crate #83

Closed jonas-schievink closed 4 years ago

jonas-schievink commented 4 years ago

The adler crate is my clean-room reimplementation of Adler-32. The adler32 crate uses the Zlib license, while adler uses MIT/Apache-2.0 (or 0BSD), which makes integration into the standard library easier.

adler is also faster in the benchmarks I've made so far, although that isn't reflected in the miniz_oxide benchmarks (which in fact seemed to slightly regress on my system, but that might just be noise).

oyvindln commented 4 years ago

This seems reasonable, will include this with the 0.4 release. Will this change work as is with no-std enabled as well?

jonas-schievink commented 4 years ago

Yes, it should. adler is #![no_std] if default-features = false is used.

oyvindln commented 4 years ago

As for performance differences, adler32 calculation is very fast in general, so maybe any difference is not significant enough to show in most comparisons. Maybe you could try compressing with custom settings and use only stored blocks, RLE or huffman-only to clearer show the difference.

Shnatsel commented 4 years ago

FWIW I did see some cases where adler32 calculated dominated the execution time: https://github.com/Frommi/miniz_oxide/issues/76

Past me made the mistake of not including the testcases in the bug report :disappointed:

Shnatsel commented 4 years ago

Hmm, adler crate is about to add unsafe code: https://github.com/jonas-schievink/adler/pull/2

My PR for #![forbid(unsafe_code)] in adler was rejected because of that: https://github.com/jonas-schievink/adler/pull/3

Since both miniz_oxide and adler32 crates currently declare #![forbid(unsafe_code)], I'd argue that there needs to be a really good reason to introduce unsafe code - at least some demonstrated performance improvements and not just "it uses a different license". I'm not convinced MIT/Apache is better than Zlib - the latter is actually very permissive because it doesn't require attribution.

jonas-schievink commented 4 years ago

I'm not adding unsafe code for fun, the PR you linked does actually improve performance considerably. The only reason it uses unsafe is because [T]::align_to is unsafe. It currently uses it unsoundly, but I haven't reviewed the PR yet (and the unsoundness is only theoretical).

Shnatsel commented 4 years ago

I'm not questioning whether the addition of unsafe code right for adler, I'm questioning whether the switch to adler is right for miniz_oxide at this point in time.

jonas-schievink commented 4 years ago

Ah, I see.

I'm not convinced MIT/Apache is better than Zlib - the latter is actually very permissive because it doesn't require attribution.

The only reason for using MIT/Apache is to ease integration into libstd. adler also allows usage under 0BSD, and that is normally my preferred license, but I was asked to use MIT/Apache to simplify the legal process. I agree this shouldn't be necessary, but oh well.

oyvindln commented 4 years ago

I would be very wary of depending on anything using [T]::align_to, as mentioned by the documentation, it's very easy to end up with undefined behaviour, even though the usage may look fine at first glance. Keep in mind that usage of zlib decoding frequently involves processing data from untrusted sources. Maybe there are some safer approaches to allow auto-vectorization.

Alternatively, maybe you could add it as an optional feature or something.

Shnatsel commented 3 years ago

adler crate has achieved the desired performance gains without resorting to unsafe code and introduced !#[forbid(unsafe_code)] pragma, so I retract my objection.