Closed v998 closed 8 years ago
Just pointed out that there was something wrong with my assumption... For the second part, I used GCC4.4 for building the x86 version, but GCC4.9.1 for the x86_64 version. This isn't fair...
Hope someone can reproduce them using both GCC 4.9.1..
The first version of GCC to support AVX2 is 4.8.
As for the difference between x86 and x86-64: indeed the patch is for x86-64 only (the only type of servers we have). x86 has much fewer registers and requires different code. In addition there are no 32-bit processors that support SSE4.1 anyway, and running a 32-bit code on 64-bit processors is not something that we do. So the x86 uses the C code instead, which is of course slower.
Just come up this idea on my mind..
I could use the patch on CentOS 6 x86 successfully (which is using GCC4.4).
But when I use it with the x86_64 (still GCC4.4), it comes up with the error on
poly1305_avx2.s
(see https://github.com/cloudflare/sslconfig/issues/10 https://github.com/cloudflare/sslconfig/issues/11 ) So I guessed there is some problem with the AVX2 optimized code... and I found it is something introduced in GCC 4.6, so I tried using GCC 4.9.1...SUCCESS !Since GCC 4.4 is quite widely-used (as provided in the CentOS 6 official repo), it would be better to mention this on the README.
So...this means that x86 do not have a well optimized code as x86_64?
Benchmark on x86:
Benchmark on x86_64:
NOTE: same machine, only different arch
It would be great to mention this on the README too.