Closed anirudhvr closed 7 years ago
I think doing a PLT relative jump is the better solution
I see; thanks for the reply. Unfortunately, I know little about calculating PLT offsets so I'd have to wait for a patch from you.
Can you confirm if the above patch would/wouldn't work? It does appear that your poly1305_avx2 function sets 8*7($state)
to 1 if poly1305_init_avx2 was run and 0 if not, so wouldn't the check for state[8*7] == 0
in C be sufficient?
It should work now.
Hi @vkrasnov - I'm getting errors of this form linking with gcc 4.9. The library builds fine but when I'm compiling another program with libcrypto, errors like the following happens (everything is linked with -fPIC, so it's not quite as simple as the diagnostic makes it seem)
binutils/bin/gold/ld: error: build/openssl/lib/libcrypto_pic.a(poly1305_avx2.o): requires dynamic R_X86_64_PC32 reloc against 'poly1305_update_x64' which may overflow at runtime; recompile with -fPIC collect2: error: ld returned 1 exit status
It appears that the two jmps to poly1305_update_x64 and poly1305_finish_x64 in poly1305_avx2.pl cannot be relocated, and -fPIC doesn't have an effect on assembly code.
I'm no expert, but I looked around a bit and it seems there's ways to calculate the offset relative to the GOT/PLT when doing the jump.
Alternatively, why not just do this in code, i.e., something like this (in e_chacha20poly1305.c) and remove the jmps from the asm? This seems to work but I'd like your thoughts before I send a PR.
`static void poly1305_update_avx2_base(poly1305_state state, const uint8_t in, size_t in_len) { if (in_len >= 512) { poly1305_update_avx2(state, in, in_len); } else { poly1305_update_x64(state, in, in_len); } }
static void poly1305_finish_avx2_base(poly1305_state* state, uint8_t mac[16]) { // In assembly, you check whether 8*7($state) == 0 if ((uint64_t)state[56] == 0) { poly1305_finish_x64(state, mac); } else { poly1305_finish_avx2(state, mac); } }
static void EVP_chacha20_poly1305_cpuid(EVP_CHACHA20_POLY1305_CTX _ctx) { if ((OPENSSL_ia32caploc()[1] >> 5) & 1) { / AVX2 _/ ctx->poly1305_init_ptr = poly1305_initx64; / Lazy init */ ctx->poly1305_update_ptr = poly1305_update_avx2_base; ctx->poly1305_finish_ptr = poly1305_finish_avx2_base; `