Implement various optimizations

mayeut commented 7 years ago

Use Nick Galbreath (@client9) implementation for scalar decoding. Use Wojciech Mula (@WojciechMula) implementation for AVX2 / SSSE3 with decoding trick by @aqrit

SSE4.1 & SSE4.2 are now useless.

Speed-up on i7-4870HQ @ 2.5 GHz (clang-800.0.42.1, x86_64) Plain decoding: +97% SSSE3 encoding: +13% SSSE3 decoding: +79% AVX encoding: +6% AVX decoding: +57% AVX2 encoding: +3% AVX2 decoding: +69%

aklomp commented 7 years ago

Thanks for the contribution. The performance increase by this commit is quite incredible and I will gladly work with you to merge these new approaches to base64.

That said, I am not really happy with this pull request. There are a few shortcomings with it:

It's a giant monolithic ball of code. This makes it hard to audit. It also makes it very difficult to say "yes" to this pull request because it's all-or-nothing. I like pull requests where each commit is a single self-consistent change. That makes it easy to audit and keeps the history clean.
It uses the work of other persons, who are credited in the code but who I know nothing about, such as whether they want to be named in this project, what their policy is regarding license, and so on.
Documentation of the new algorithms is poor. Hyperlinks to other projects are not documentation, this code must stand on its own.
It suffers from poor formatting, whitespace errors and other hygiene issues that I'd like to not import.

So while I applaud your continued interest in this library and would really like to work with you to integrate these improvements, I can't accept this pull request as-is. I'll go through the pull request and leave line comments for specific issues.

mayeut commented 7 years ago

I tried to answer every request you made in comments. Regarding the monolithic status, it's true. I could make 3 PR:

One for Nick Galbreath decoding code.
One for AVX2/SSSE3 encoding (same tricks used so makes sense to me to do both in one PR)
One for AVX2/SSSE3/NEON32 decoding (again, same tricks used so makes sense to me to do those in one PR)

mayeut commented 7 years ago

Superseded by #36 Other PR will follow #36 once it's merged in.

aklomp / base64

Implement various optimizations #35