Currently, most of the time in the signing protocol is spent in Montgomery exponentiation. Key refresh is split between exponentiation and prime number generation, but the latter is mainly exponentiation again (most of the time is spent in Miller-Rabin tests). So it would help a lot if the exponentiation performance is improved.
Possible avenues:
Replace schoolbook multiplication with Karatsuba or Toom-Cook. This may start making a difference at our integer sizes (2048 bit). This has to be done within crypto-bigint, see https://github.com/RustCrypto/crypto-bigint/issues/66
Use wNAF exponentiation instead of the current fixed-window one (for the cases where the exponent is not secret). This has to be done within crypto-bigint.
crypto-bigint's pow() supports exponents of arbitrary size (that is you can raise Uint<N> into Uint<M> power). We currently only raise Uint<N> to Uint<N>, and implement Uint<N>^Uint<2*N> and Uint<N>^Uint<4*N> by breaking the exponent in halves and exponentiating separately. If we could use the arbitrary size exponentiation, it could make this faster, because we would not have to calculate x^{2^N} separately to merge the halves - it's already calculated by the fixed window algorithm.
In some places where we calculate x^y mod N we also know phi(N) (the totient), so we can instead calculate x^(y mod phi(N)) mod N. If y is large (of the order of N^2), this may be faster than direct exponentiation.
Currently, most of the time in the signing protocol is spent in Montgomery exponentiation. Key refresh is split between exponentiation and prime number generation, but the latter is mainly exponentiation again (most of the time is spent in Miller-Rabin tests). So it would help a lot if the exponentiation performance is improved.
Possible avenues:
crypto-bigint
, see https://github.com/RustCrypto/crypto-bigint/issues/66crypto-bigint
.crypto-bigint
'spow()
supports exponents of arbitrary size (that is you can raiseUint<N>
intoUint<M>
power). We currently only raiseUint<N>
toUint<N>
, and implementUint<N>^Uint<2*N>
andUint<N>^Uint<4*N>
by breaking the exponent in halves and exponentiating separately. If we could use the arbitrary size exponentiation, it could make this faster, because we would not have to calculatex^{2^N}
separately to merge the halves - it's already calculated by the fixed window algorithm.x^y mod N
we also knowphi(N)
(the totient), so we can instead calculatex^(y mod phi(N)) mod N
. Ify
is large (of the order of N^2), this may be faster than direct exponentiation.