AztecProtocol / barretenberg

Apache License 2.0
138 stars 87 forks source link

asm_macros (both ADX and without ADX version) Compute Montgomery multiplication of a, b MUL produces wrong output for large a or/and b limbs #978

Open zkbitcoin opened 4 months ago

zkbitcoin commented 4 months ago

during test with large limbs found output data is wrong (see simple test bed at https://github.com/zkbitcoin/nasm-adx)

running example will create following output (asm(MUL is from projects asm_macros.hpp) case of overflow most likely see limbs_a[0] 9293073166814171452ULL

input limbs:

uint64_t limbs_r[4] = {}; uint64_t limbs_a[4] = {9293073166814171452ULL,4158907695144192454,2644031866505052884,3024693275553353487}; uint64_t limbs_b[4] = {2812702673390851119,5479905877917956870,1104182671213310543,818574998703379345};

generates:

asm(MUL

limbs_r[0] is 12178871726809496723 limbs_r[1] is 13840435079915171493 limbs_r[2] is 16771051252808782701 limbs_r[3] is 3578015002697288320

calculation by hand of Montgomery multiplier should generate (this is correct output)

limbs_r[0] is 7846254855529840460 limbs_r[1] is 2923310935437288472 limbs_r[2] is 3489859301534087952 limbs_r[3] is 91016735894317655

zkbitcoin commented 4 months ago

adding to assembly (right before pushing data on rdi and returning fixes it) will do pull requests later

;comparison cmp r15,[modulus + 24] jc done jnz sq cmp r14,[modulus + 16] jc done jnz sq cmp r13,[modulus + 8] jc done jnz sq cmp r12,[modulus + 0] jc done jnz sq sq: sub r12,[modulus + 0] sbb r13,[modulus + 8] sbb r14,[modulus + 16] sbb r15,[modulus + 24] done: