An improved Montgomery algorithm, up to 50% performance improvement. It utilizes the particularity of low-order multiplication and mod B^n-1 algorithm to reduce the number of calculations. I have written the code in CL, if anybody are interested in this algorithm, I will add the corresponding test and detailed explanation.
name old time/op new time/op delta
ExpLarge/100-4 137ms ± 3% 190ms ±15% +38.67% (p=0.004 n=6+5)
ExpLarge/200-4 987ms ± 0% 999ms ± 3% ~ (p=0.662 n=6+5)
ExpLarge/400-4 7.87s ± 3% 6.02s ± 2% -23.55% (p=0.003 n=7+5)
ExpLarge/600-4 26.8s ± 1% 16.7s ± 3% -37.62% (p=0.004 n=6+5)
ExpLarge/800-4 63.6s ± 2% 35.4s ± 1% -44.33% (p=0.003 n=7+5)
ExpLarge/1000-4 123s ± 0% 64s ± 0% -48.39% (p=0.036 n=3+5)
An improved Montgomery algorithm, up to 50% performance improvement. It utilizes the particularity of low-order multiplication and mod B^n-1 algorithm to reduce the number of calculations. I have written the code in CL, if anybody are interested in this algorithm, I will add the corresponding test and detailed explanation. name old time/op new time/op delta ExpLarge/100-4 137ms ± 3% 190ms ±15% +38.67% (p=0.004 n=6+5) ExpLarge/200-4 987ms ± 0% 999ms ± 3% ~ (p=0.662 n=6+5) ExpLarge/400-4 7.87s ± 3% 6.02s ± 2% -23.55% (p=0.003 n=7+5) ExpLarge/600-4 26.8s ± 1% 16.7s ± 3% -37.62% (p=0.004 n=6+5) ExpLarge/800-4 63.6s ± 2% 35.4s ± 1% -44.33% (p=0.003 n=7+5) ExpLarge/1000-4 123s ± 0% 64s ± 0% -48.39% (p=0.036 n=3+5)