Closed fredrik-johansson closed 1 month ago
Fixes #1965 by using a Z/nZ implementation (nmod, mpn_mod or fmpz_mod) instead of direct fmpz arithmetic to compute the Lucas chain.
nmod
mpn_mod
fmpz_mod
Listed below is the speedup to run fmpz_is_probabprime_BPSW(p) given p = nextprime(2^(b-1)).
fmpz_is_probabprime_BPSW(p)
There is a slight regression just above 1024 bits, presumably because fmpz_mod_mul currently has a bunch of overhead it shouldn't have.
fmpz_mod_mul
I guess this can be made to go even faster if we implement Montgomery reduction.
b old time new time speedup 100 9.66e-06 3.47e-06 2.784 110 1.08e-05 3.88e-06 2.784 121 1.16e-05 4.08e-06 2.843 133 1.98e-05 1.25e-05 1.584 146 2.17e-05 1.33e-05 1.632 160 2.35e-05 1.5e-05 1.567 176 2.61e-05 1.68e-05 1.554 193 3.73e-05 2.27e-05 1.643 212 3.7e-05 2.47e-05 1.498 233 4.09e-05 2.76e-05 1.482 256 4.29e-05 2.61e-05 1.644 281 5.73e-05 3.88e-05 1.477 309 6.36e-05 4.26e-05 1.493 339 8.12e-05 5.58e-05 1.455 372 8.75e-05 6.11e-05 1.432 409 0.000116 7.87e-05 1.474 449 0.000147 9.74e-05 1.509 493 0.00016 0.000108 1.481 542 0.000208 0.000144 1.444 596 0.000252 0.000197 1.279 655 0.000321 0.000247 1.300 720 0.000398 0.000308 1.292 792 0.000497 0.000379 1.311 871 0.000571 0.000468 1.220 958 0.000729 0.00058 1.257 1053 0.00101 0.00106 0.953 1158 0.00132 0.00142 0.930 1273 0.0016 0.00171 0.936 1400 0.00206 0.002 1.030 1540 0.00274 0.0027 1.015 1694 0.00334 0.00317 1.054 1863 0.00429 0.00411 1.044 2049 0.00557 0.00569 0.979 2253 0.00699 0.00655 1.067 2478 0.00891 0.00824 1.081 2725 0.0117 0.0108 1.083 2997 0.0148 0.0136 1.088 3296 0.0193 0.0179 1.078 3625 0.0251 0.0231 1.087 3987 0.0322 0.0301 1.070 4385 0.0419 0.0391 1.072 4823 0.0534 0.0501 1.066 5305 0.07 0.0644 1.087 5835 0.0896 0.0827 1.083 6418 0.117 0.107 1.093 7059 0.148 0.137 1.080
Fixes #1965 by using a Z/nZ implementation (
nmod
,mpn_mod
orfmpz_mod
) instead of direct fmpz arithmetic to compute the Lucas chain.Listed below is the speedup to run
fmpz_is_probabprime_BPSW(p)
given p = nextprime(2^(b-1)).There is a slight regression just above 1024 bits, presumably because
fmpz_mod_mul
currently has a bunch of overhead it shouldn't have.I guess this can be made to go even faster if we implement Montgomery reduction.