aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
884 stars 165 forks source link

SSSE3->AVX2 encoding optimization #37

Closed mayeut closed 7 years ago

mayeut commented 7 years ago

Use Wojciech Mula (@WojciechMula) implementation update for AVX2 / SSSE3 encoding.

SSSE3 implementation is reused in SSE4.1, SSE4.2 and AVX dispatched encoding loops.

SSE4.1 implementation is now useless but kept to ease integration of future updates if needed.

Speed-up on i7-4870HQ @ 2.5 GHz (clang-800.0.42.1, x86_64) SSSE3 encoding: +20% SSE4.2 encoding: +8% AVX encoding: +7% AVX2 encoding: +3%

mayeut commented 7 years ago

As a side note, there's a bug in clang 3.4 leading to poor results with this implementation: https://bugs.llvm.org//show_bug.cgi?id=18478

This can be seen on the travis-ci clang build which uses clang 3.4

mayeut commented 7 years ago

Another side note, speed-ups were measured with 10MB buffer as reported in README.md. When using smaller buffers, and thus taking advantage of cache effects, speed-ups are even higher with a special mention for AVX2 which gives more than 30% throughput increase.

aklomp commented 7 years ago

Merged, thanks! A very clever trick, this bitshifting-by-multiplication, now that I finally understand it :) Sorry for the delay in merging.

Though this and other contributions have added valuable improvements, I think it might be time for a little cleanup round: fixing comments, style, documentation. I might push something along these lines soon, if I have time.