cloudflare / sslconfig

Cloudflare's Internet facing SSL configuration
BSD 3-Clause "New" or "Revised" License
1.3k stars 132 forks source link

CHACHA20 Patch doesn't work with GCC 4.4, and architecture difference.. #19

Closed v998 closed 8 years ago

v998 commented 8 years ago

Just come up this idea on my mind..

I could use the patch on CentOS 6 x86 successfully (which is using GCC4.4).

But when I use it with the x86_64 (still GCC4.4), it comes up with the error on poly1305_avx2.s (see https://github.com/cloudflare/sslconfig/issues/10 https://github.com/cloudflare/sslconfig/issues/11 ) So I guessed there is some problem with the AVX2 optimized code... and I found it is something introduced in GCC 4.6, so I tried using GCC 4.9.1...SUCCESS !

Since GCC 4.4 is quite widely-used (as provided in the CentOS 6 official repo), it would be better to mention this on the README.


So...this means that x86 do not have a well optimized code as x86_64?

Benchmark on x86:

chacha20-poly1305    12908.94k    42709.33k    69848.06k    82854.23k    87263.91k
256 bit ecdh (nistp256)   0.0012s    806.4

Benchmark on x86_64:

chacha20-poly1305    38188.50k   123035.73k   251849.73k   411736.41k   486099.63k
256 bit ecdh (nistp256)   0.0001s   7304.7

NOTE: same machine, only different arch

It would be great to mention this on the README too.

v998 commented 8 years ago

Just pointed out that there was something wrong with my assumption... For the second part, I used GCC4.4 for building the x86 version, but GCC4.9.1 for the x86_64 version. This isn't fair...

Hope someone can reproduce them using both GCC 4.9.1..

vkrasnov commented 8 years ago

The first version of GCC to support AVX2 is 4.8.

As for the difference between x86 and x86-64: indeed the patch is for x86-64 only (the only type of servers we have). x86 has much fewer registers and requires different code. In addition there are no 32-bit processors that support SSE4.1 anyway, and running a 32-bit code on 64-bit processors is not something that we do. So the x86 uses the C code instead, which is of course slower.