Multiplicative inverses show up in a few places in Prio3, as a correction for adding multiple shares of constants in circuits, and in NTT inverse operations to properly scale outputs. In each case, we take an ordinal number, project it into the field, and then invert it. (This is all likely more performance-relevant for smaller circuit sizes than for larger sizes, since multiplications for NTTs grow faster than any for inverses)
Where possible, we could hoist these inverses into circuit or gadget constructors, to precompute this inverse once and reuse it.
Otherwise, we could specialize inv() by using "addition chain exponentiation" instead of "square and multiply exponentiation" that the more-generic pow() does. The idea is that, instead of computing x, x^2, x^4, x^8, etc., and multiplying selected power of two powers together, we instead do an expensive search upfront for a good routine that computes a different set of intermediate values, using squares and multiplies, to get the same result. Since all our primes are chosen such that they are both close to a power of two, and are such that p-1 is very smooth, with many factors of 2, then the exponent we use for inversion via Euler's theorem, p-2, has a lot of ones in its binary expansion. This translates to lots of multiplies, and counting multiply operations would get us something just under double the bit width of the prime. All our primes are big enough that doing an exhaustive search for optimal addition chains is impractical, but https://github.com/mmcloughlin/addchain can find very good addition chains quickly. This would let us take the number of multiply operations per inverse from just under double the bit width to just over the bit width. (hat tip to Filippo's stream: https://www.twitch.tv/videos/2211473764)
Edit: it is hilarious, actually, that this got issue number 1111, because _1111 is the sort of temporary variable name you would commonly see in addition chain exponentiation routines
Multiplicative inverses show up in a few places in Prio3, as a correction for adding multiple shares of constants in circuits, and in NTT inverse operations to properly scale outputs. In each case, we take an ordinal number, project it into the field, and then invert it. (This is all likely more performance-relevant for smaller circuit sizes than for larger sizes, since multiplications for NTTs grow faster than any for inverses)
Where possible, we could hoist these inverses into circuit or gadget constructors, to precompute this inverse once and reuse it.
Otherwise, we could specialize
inv()
by using "addition chain exponentiation" instead of "square and multiply exponentiation" that the more-genericpow()
does. The idea is that, instead of computing x, x^2, x^4, x^8, etc., and multiplying selected power of two powers together, we instead do an expensive search upfront for a good routine that computes a different set of intermediate values, using squares and multiplies, to get the same result. Since all our primes are chosen such that they are both close to a power of two, and are such that p-1 is very smooth, with many factors of 2, then the exponent we use for inversion via Euler's theorem, p-2, has a lot of ones in its binary expansion. This translates to lots of multiplies, and counting multiply operations would get us something just under double the bit width of the prime. All our primes are big enough that doing an exhaustive search for optimal addition chains is impractical, but https://github.com/mmcloughlin/addchain can find very good addition chains quickly. This would let us take the number of multiply operations per inverse from just under double the bit width to just over the bit width. (hat tip to Filippo's stream: https://www.twitch.tv/videos/2211473764)Edit: it is hilarious, actually, that this got issue number 1111, because
_1111
is the sort of temporary variable name you would commonly see in addition chain exponentiation routines