SciNim / theo

Theo is an optimized bigint and number theory library for Nim
Other
26 stars 0 forks source link

performance vs GMP? #1

Open timotheecour opened 3 years ago

timotheecour commented 3 years ago

Curious how it compares to GMP, and, less importantly, to these:

links

mratsim commented 3 years ago

The library is still vaporware at the moment.

That said I plan to bring all the lessons learned from https://github.com/mratsim/constantine on writing high performance big int code. In particular the inline assembler I developed for example for multiplication using MULX/ADCX/ADOX: https://github.com/mratsim/constantine/blob/e89429e/constantine/arithmetic/assembly/limbs_asm_mul_x86_adx_bmi2.nim#L72-L110

Constantine is significantly faster than GMP even though its constant-time but it's mostly because of specialization:

In particular, no library that uses pure C can reach GMP or Constantine speed simply due to the compiler being the biggest optimization barrier for big integer code, even with intrinsics: https://gcc.godbolt.org/z/2h768y

#include <stdint.h>
#include <x86intrin.h>

void add256(uint64_t a[4], uint64_t b[4]){
  uint8_t carry = 0;
  for (int i = 0; i < 4; ++i)
    carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
}

GCC:

add256:
        movq    (%rsi), %rax
        addq    (%rdi), %rax
        setc    %dl
        movq    %rax, (%rdi)
        movq    8(%rdi), %rax
        addb    $-1, %dl
        adcq    8(%rsi), %rax
        setc    %dl
        movq    %rax, 8(%rdi)
        movq    16(%rdi), %rax
        addb    $-1, %dl
        adcq    16(%rsi), %rax
        setc    %dl
        movq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        addb    $-1, %dl
        adcq    %rax, 24(%rdi)
        ret

Clang:

add256:
        movq    (%rsi), %rax
        addq    %rax, (%rdi)
        movq    8(%rsi), %rax
        adcq    %rax, 8(%rdi)
        movq    16(%rsi), %rax
        adcq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        adcq    %rax, 24(%rdi)
        retq