Multiprecision Granlund-Möller division

Our single-word arithmetic (in nmod and ulong_extras) with precomputed inverses uses the Granlund-Möller division algorithm (https://gmplib.org/~tege/division-paper.pdf).

The mpn_*_preinvn algorithms meanwhile use a slightly different precomputed inverse, with a slightly different division algorithm.

We should consider using the same algorithm here. If we look at nmod reduction

   #define NMOD_RED2(r, a_hi, a_lo, mod) \
      do { \
         mp_limb_t q0xx, q1xx, r1xx; \
         const mp_limb_t u1xx = ((a_hi)<<(mod).norm) + r_shift((a_lo), FLINT_BITS - (mod).norm);    \
         const mp_limb_t u0xx = ((a_lo)<<(mod).norm); \
         const mp_limb_t nxx = ((mod).n<<(mod).norm); \
         umul_ppmm(q1xx, q0xx, (mod).ninv, u1xx); \
         add_ssaaaa(q1xx, q0xx, q1xx, q0xx, u1xx, u0xx); \
         r1xx = (u0xx - (q1xx + 1)*nxx); \
         if (r1xx > q0xx) r1xx += nxx; \
         if (r1xx < nxx) r = (r1xx>>(mod).norm); \
         else r = ((r1xx - nxx)>>(mod).norm); \
      } while (0)

we see that the carry-out from the second (low) multiplication isn't needed.

The first multiplication is a full multiplication (we use the low part of the result subsequently), but it should be possible to compute with just the extra limb and do some further adjustments only with low probability.

flintlib / flint

Multiprecision Granlund-Möller division #1887