we see that the carry-out from the second (low) multiplication isn't needed.
The first multiplication is a full multiplication (we use the low part of the result subsequently), but it should be possible to compute with just the extra limb and do some further adjustments only with low probability.
Our single-word arithmetic (in
nmod
andulong_extras
) with precomputed inverses uses the Granlund-Möller division algorithm (https://gmplib.org/~tege/division-paper.pdf).The
mpn_*_preinvn
algorithms meanwhile use a slightly different precomputed inverse, with a slightly different division algorithm.We should consider using the same algorithm here. If we look at
nmod
reductionwe see that the carry-out from the second (low) multiplication isn't needed.
The first multiplication is a full multiplication (we use the low part of the result subsequently), but it should be possible to compute with just the extra limb and do some further adjustments only with low probability.