Open timotheecour opened 3 years ago
The library is still vaporware at the moment.
That said I plan to bring all the lessons learned from https://github.com/mratsim/constantine on writing high performance big int code. In particular the inline assembler I developed for example for multiplication using MULX/ADCX/ADOX: https://github.com/mratsim/constantine/blob/e89429e/constantine/arithmetic/assembly/limbs_asm_mul_x86_adx_bmi2.nim#L72-L110
Constantine is significantly faster than GMP even though its constant-time but it's mostly because of specialization:
proc sub(a: var BigInt, b: Bigint)
is a drag to implement when a < b
In particular, no library that uses pure C can reach GMP or Constantine speed simply due to the compiler being the biggest optimization barrier for big integer code, even with intrinsics: https://gcc.godbolt.org/z/2h768y
#include <stdint.h>
#include <x86intrin.h>
void add256(uint64_t a[4], uint64_t b[4]){
uint8_t carry = 0;
for (int i = 0; i < 4; ++i)
carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
}
GCC:
add256:
movq (%rsi), %rax
addq (%rdi), %rax
setc %dl
movq %rax, (%rdi)
movq 8(%rdi), %rax
addb $-1, %dl
adcq 8(%rsi), %rax
setc %dl
movq %rax, 8(%rdi)
movq 16(%rdi), %rax
addb $-1, %dl
adcq 16(%rsi), %rax
setc %dl
movq %rax, 16(%rdi)
movq 24(%rsi), %rax
addb $-1, %dl
adcq %rax, 24(%rdi)
ret
Clang:
add256:
movq (%rsi), %rax
addq %rax, (%rdi)
movq 8(%rsi), %rax
adcq %rax, 8(%rdi)
movq 16(%rsi), %rax
adcq %rax, 16(%rdi)
movq 24(%rsi), %rax
adcq %rax, 24(%rdi)
retq
Curious how it compares to GMP, and, less importantly, to these:
links