Encaps speedup with vector instructions on AVX2 and AVX512.

Keygen speedup with vector instructions on AVX2 and AVX512.. For this use new macros have been added to bike_defs.h, defs.h and x86_64_intrinsic.h. DIVIDE_AND_CEIL has been changed due to being erroneous on inputs which are multiples of the divisor.

Issue #, if available:

DIVIDE_AND_CEIL(x, divisor) = ((x) + (divisor))/(divisor) is erroneous when x is a multiple of the divisor.
generate_indices_mod_z does not use AVX2/AVX512 vector instructions.

Description of changes:

DIVIDE_AND_CEIL(x, divisor) changed to ((x) + (divisor) - 1)/(divisor) Now correct if x is a multiple of the divisor.
is_new(subfunction of generate_indices_mod_z) sped up by using vector instructions. New macros added to bike_defs.h, defs.h and x86_64_intrinsic.h

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

awslabs / bike-kem

Encaps speedup with vector instructions on AVX2 and AVX512. #7