aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
884 stars 165 forks source link

Improve SSSE3 and AVX2 decoding speed #17

Closed mayeut closed 8 years ago

mayeut commented 8 years ago

The dec_reshuffle function now uses less instructions. Speed-up on Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz using Apple LLVM version 8.0.0 (clang-800.0.38) AVX2: +8.5% compared to previous version SSSE3: +8% compared to previous version

Speed-up is much less than #16 but there's one significant enough I think.

Full results before & after modifications follows. Before:

Filling buffer with 10.0 MB of random data...
Testing with buffer size 10 MB, fastest of 100 * 1
AVX2    encode  7829.44 MB/sec
AVX2    decode  6388.09 MB/sec
plain   encode  1499.43 MB/sec
plain   decode  1556.07 MB/sec
SSSE3   encode  5600.94 MB/sec
SSSE3   decode  3577.29 MB/sec
Testing with buffer size 1 MB, fastest of 100 * 10
AVX2    encode  8968.92 MB/sec
AVX2    decode  6495.38 MB/sec
plain   encode  1500.98 MB/sec
plain   decode  1562.21 MB/sec
SSSE3   encode  5682.25 MB/sec
SSSE3   decode  3595.74 MB/sec
Testing with buffer size 100 KB, fastest of 100 * 100
AVX2    encode  9014.38 MB/sec
AVX2    decode  6478.45 MB/sec
plain   encode  1500.25 MB/sec
plain   decode  1560.77 MB/sec
SSSE3   encode  5699.02 MB/sec
SSSE3   decode  3591.22 MB/sec
Testing with buffer size 10 KB, fastest of 1000 * 100
AVX2    encode  8912.60 MB/sec
AVX2    decode  6424.46 MB/sec
plain   encode  1503.06 MB/sec
plain   decode  1558.58 MB/sec
SSSE3   encode  5670.67 MB/sec
SSSE3   decode  3584.69 MB/sec
Testing with buffer size 1 KB, fastest of 1000 * 1000
AVX2    encode  6636.74 MB/sec
AVX2    decode  5339.24 MB/sec
plain   encode  1418.70 MB/sec
plain   decode  1519.34 MB/sec
SSSE3   encode  5048.11 MB/sec
SSSE3   decode  3397.44 MB/sec

After:

Filling buffer with 10.0 MB of random data...
Testing with buffer size 10 MB, fastest of 100 * 1
AVX2    encode  7868.34 MB/sec
AVX2    decode  7047.91 MB/sec
plain   encode  1499.65 MB/sec
plain   decode  1557.21 MB/sec
SSSE3   encode  5588.29 MB/sec
SSSE3   decode  3882.09 MB/sec
Testing with buffer size 1 MB, fastest of 100 * 10
AVX2    encode  8970.38 MB/sec
AVX2    decode  7093.07 MB/sec
plain   encode  1501.03 MB/sec
plain   decode  1562.29 MB/sec
SSSE3   encode  5680.62 MB/sec
SSSE3   decode  3900.21 MB/sec
Testing with buffer size 100 KB, fastest of 100 * 100
AVX2    encode  9010.17 MB/sec
AVX2    decode  7029.67 MB/sec
plain   encode  1500.42 MB/sec
plain   decode  1561.01 MB/sec
SSSE3   encode  5691.45 MB/sec
SSSE3   decode  3899.66 MB/sec
Testing with buffer size 10 KB, fastest of 1000 * 100
AVX2    encode  8928.00 MB/sec
AVX2    decode  6894.87 MB/sec
plain   encode  1504.46 MB/sec
plain   decode  1558.74 MB/sec
SSSE3   encode  5672.21 MB/sec
SSSE3   decode  3913.54 MB/sec
Testing with buffer size 1 KB, fastest of 1000 * 1000
AVX2    encode  6726.56 MB/sec
AVX2    decode  5688.64 MB/sec
plain   encode  1415.77 MB/sec
plain   decode  1519.19 MB/sec
SSSE3   encode  5090.29 MB/sec
SSSE3   decode  3581.70 MB/sec