Open mratsim opened 2 years ago
sounds all very nice (not sure what the advantage of this approach - ergonomics? performance? correctness? - but I guess it is worth exploring), I found it odd (as a trained mathematician) to hear about OddField
and in fact I think you mean modular arithmetic instead of field arithmetic, and odd field should be ring of integers modulo an odd integer and so on (Binary and Ternary fields are not fields...).
Oops good point, I was on the implementation details and you're right. I'll change field
to Modular
.
Binary fields do exist since 2 is prime though GF(2ᵏ) is technically a "binary extension field".
Ergonomics-wise, we can expose just PrimeField and the Modular type since they are likely to be the most used for starter.
Performance-wise it should be a win since RNS bases are very often used to accelerate cryptography in hardware ASIC/FPGA. In particular, we can use Montgomery representation for odd moduli but we can't for even, this avoids expensive modulo operation. And modulo a power-of-two is also trivial.
Correctness-wise, what do you mean? I'm not out to write an incorrect library :sweat_smile:
Oops good point, I was on the implementation details and you're right. I'll change field to Modular.
👍 asbolutely, defintely sounded like a unintended slip while focusing on other stuff 😄
Binary fields do exist since 2 is prime though GF(2ᵏ) is technically a "binary extension field".
yes, you are right that field of power of a prime do exist but they are not the ring of integers module a prime power
Ergonomics-wise, we can expose just PrimeField and the Modular type since they are likely to be the most used for starter.
seems appropriate
Performance-wise it should be a win since RNS bases are very often used to accelerate cryptography in hardware ASIC/FPGA.
absolutely trust you on that :)
Correctness-wise, what do you mean? I'm not out to write an incorrect library 😅
mmh, probably I meant something more like safe-ness? the concept I was trying to express was helping the user of the library to avoid using the wrong algorithm for a given context (similar but not exactly like we protect the user to sum a float and an int)
anyway sounds this approach could give an improvement on all fronts!
mmh, probably I meant something more like safe-ness? the concept I was trying to express was helping the user of the library to avoid using the wrong algorithm for a given context (similar but not exactly like we protect the user to sum a float and an int)
Oh that would be just internal details, external API would not expose all that, especially initially when we figure stuff out.
Some benchmarks with a 381-bit moduli.
Note: My modular reduction is constant-time for the algorithm (i.e. always does iterations to cover the worst cases, even if a < m
or if a and m have close number of bits), but it does use hardware division at the moment as I didn't have the time to reimplement hardware division. Concretely it simulates the worst case modulus of a generic bigint library
# To be placed in the "./build/" folder
# of unmerged branch
# https://github.com/mratsim/constantine/tree/mont_repr_refactor
import
../benchmarks/[bench_blueprint],
../constantine/config/[curves, type_ff],
../constantine/arithmetic/[bigints, finite_fields],
../constantine/io/io_bigints,
../helpers/prng_unsafe
proc report(op, desc: string, start, stop: MonoTime, startClk, stopClk: int64, iters: int) =
let ns = inNanoseconds((stop-start) div iters)
let throughput = 1e9 / float64(ns)
when SupportsGetTicks:
echo &"{op:<70} {desc:<18} {throughput:>15.3f} ops/s {ns:>9} ns/op {(stopClk - startClk) div iters:>9} CPU cycles (approx)"
else:
echo &"{op:<70} {desc:<18} {throughput:>15.3f} ops/s {ns:>9} ns/op"
template bench(op: string, desc: string, iters: int, body: untyped): untyped =
block:
measure(iters, startTime, stopTime, startClk, stopClk, body)
report(op, desc, startTime, stopTime, startClk, stopClk, iters)
# warmup
proc warmup*() =
# Warmup - make sure cpu is on max perf
let start = cpuTime()
var foo = 123
for i in 0 ..< 300_000_000:
foo += i*i mod 456
foo = foo mod 789
# Compiler shouldn't optimize away the results as cpuTime rely on sideeffects
let stop = cpuTime()
echo &"Warmup: {stop - start:>4.4f} s, result {foo} (displayed to avoid compiler optimizing warmup away)\n"
warmup()
proc mulModBench*(iters: int) =
# BLS12-381 prime
const m = BigInt[381].fromHex"0x1a0111ea397fe69a4b1ba7b6434bacd764774b84f38512bf6730d2a0f6b0f6241eabfffeb153ffffb9feffffffffaaab"
var a = rng.random_unsafe(BigInt[381])
a.reduce(a, m)
var b = rng.random_unsafe(BigInt[381])
b.reduce(b, m)
var r: BigInt[381]
var t: BigInt[762]
# mod has no branches but use hardware DIV
bench("Mul + constant-time+hardware mod (a.k.a always worst-case)", "BLS12-381", iters):
t.prod(a, b)
r.reduce(t, m)
let x = rng.random_unsafe(Fp[BLS12_381])
let y = rng.random_unsafe(Fp[BLS12_381])
var z: Fp[BLS12_381]
bench("Montgomery multiplication", "BLS12-381", iters):
z.prod(x, y)
bench("BigInt <-> Montgomery roundtrip", "BLS12-381", iters):
r.fromField(z)
z.fromBig(r)
mulModBench(10000)
See Implementing fast modular exponentiation - a guide - https://github.com/status-im/nim-stint/issues/126
And implementation: https://github.com/mratsim/constantine/pull/242#issuecomment-1568223941
Speed is at least 85% of GMP of modular exponentiation, the most CPU intensive modular arithmetic primitive.
It confirms that splitting between odd modulus and power-of-two modulus and recombining when needed via the Chinese Remainder Theorem is the way to go.
edited: Field == Modular arithmetic for prime modulus only
Some ideas of modular arithmetic design.
Many algorithms are only applicable to prime moduli (Legendre symbol, Fermat's Little Theorem, Square roots ...) or odd moduli (Jacobi symbol, GCD and Modular inversion via extended Euclid, Montgomery arithmetic). There are also specialized routines for binary extension fields (mod 2ᵏ) and ternary extension fields (mod 3ᵏ) but there is nothing for generic even moduli.
What we can do is represent generic modulus on a dual base power-of-two 2ᵏ and odd similar to the Residue Number System representation. The results of all operations can be recombined using the Chinese Remainder Theorem.
Concretely we would have
Operations would then have the form:
With the convention for
sum
for operation that modify a buffer andadd
for out-of-place functions.Alternatively we can have the Modulus object be carried by Modular elements to allow convenient operators:
This would require making it a ref object.