Closed mayeut closed 7 years ago
Thanks for the contribution. The performance increase by this commit is quite incredible and I will gladly work with you to merge these new approaches to base64.
That said, I am not really happy with this pull request. There are a few shortcomings with it:
So while I applaud your continued interest in this library and would really like to work with you to integrate these improvements, I can't accept this pull request as-is. I'll go through the pull request and leave line comments for specific issues.
I tried to answer every request you made in comments. Regarding the monolithic status, it's true. I could make 3 PR:
Superseded by #36 Other PR will follow #36 once it's merged in.
Use Nick Galbreath (@client9) implementation for scalar decoding. Use Wojciech Mula (@WojciechMula) implementation for AVX2 / SSSE3 with decoding trick by @aqrit
SSE4.1 & SSE4.2 are now useless.
Speed-up on i7-4870HQ @ 2.5 GHz (clang-800.0.42.1, x86_64) Plain decoding: +97% SSSE3 encoding: +13% SSSE3 decoding: +79% AVX encoding: +6% AVX decoding: +57% AVX2 encoding: +3% AVX2 decoding: +69%