aurora-opensource / au

A C++14-compatible physical units library with no dependencies and a single-file delivery option. Emphasis on safety, accessibility, performance, and developer experience.
Apache License 2.0
329 stars 21 forks source link

Improve performance for modular arithmetic #328

Open chiphogg opened 3 days ago

chiphogg commented 3 days ago

Right now, there are numbers we know we can't factor at compile time: see comments in #217. We're using too many steps, and clang bails. However, those comments also suggest that it should be possible to get away with far far fewer steps. It turns out that we are spending almost all of our steps on our custom implementation of mul_mod. If we can find a much more efficient implementation, we should be able to factor a much wider variety of numbers. Leading candidate options include:

  1. Implement Montgomery/REDC form
  2. Do manual "double-wide" multiplication and mod

Whichever route we go, getting a more efficient mul_mod should also make it worthwhile to switch to a "batched" implementation of Pollard's rho, which replaces many GCD calls (slow no matter what) with modular multiplication calls (which can be made fast). The batched implementation is a pretty marginal win if we implement it now (and is in some cases a little bit worse), but is a much bigger win if multiplication is cheap.