Closed nKolja closed 3 years ago
Dear Novak,
Thank you for submitting this PR. We will review it as soon as possible.
Regarding the title of this PR: I see that the performance difference applies only to the encapsulation phase and not for the keygen. I suggest a change in the title accordingly.
Keygen speedup with vector instructions on AVX2 and AVX512.. For this use new macros have been added to bike_defs.h, defs.h and x86_64_intrinsic.h. DIVIDE_AND_CEIL has been changed due to being erroneous on inputs which are multiples of the divisor.
Issue #, if available:
DIVIDE_AND_CEIL(x, divisor) = ((x) + (divisor))/(divisor) is erroneous when x is a multiple of the divisor.
generate_indices_mod_z does not use AVX2/AVX512 vector instructions.
Description of changes:
DIVIDE_AND_CEIL(x, divisor) changed to ((x) + (divisor) - 1)/(divisor) Now correct if x is a multiple of the divisor.
is_new(subfunction of generate_indices_mod_z) sped up by using vector instructions. New macros added to bike_defs.h, defs.h and x86_64_intrinsic.h
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.