Slow polynomial factoring

wbhart commented 3 years ago

When I implemented polynomial factoring in 2016, the following were the times:

Poly	NTL	Magma	Flint-1.6	New
P1	0.07	0.03	0.10	0.30
P2	0.2	0.08	0.10	0.50
P3	0.3	0.16	0.16	0.93
P4	2.0	1.9	0.94	4.6
P5	0.09	0.11	0.04	0.03
P6	0.12	0.11	0.11	0.26
P7	1.1	1.1	0.52	1.8
P8	3.4	2.2	1.5	0.79
M12_5	12.4	9.5	2.9	2.2
M12_6	21.7	21.5	5.2	73
S7	0.34	0.42	0.20	0.56
S8	3.8	4.6	2.1	7.9
S9	71	165	21	----
S10	??	??	??	----
T1	3.8	2.5	1.2	0.6
T2	3.2	2.1	1.2	0.6
T3	24	20	7.4	59
H1	??	??	??	9.3
H2	??	??	??	----

I also gave a list of things that we don't do yet, some of which have been done since:

tuning of parameters p^a, rho, delta, eta, l, E_bound, number of CLDs, etc a la van Hoeij-Novocin
switch to Zassenhaus once number of local factors is low enough
inflation/deflation (aka power hack)
U_LLL floating point LLL trick
spend more time searching for optimal prime p
mix CLD data using random matrices
only run LLL if justified as outlined in van Hoeij-Novocin
only add row if justified as outlined in van Hoeij-Novocin
speed up basic arithmetic in LLL
fast divisibility testing, e.g. mod a small prime
asymptotically fast truncated division smod p^a

fredrik-johansson commented 3 years ago

We've made definite progress in the last year.

Here's the improvement over our last release on my machine, with #930 and #937 merged:

         Flint-2.7.1   Current
P1         143 ms       16 ms     (8.9x faster)
P2         275 ms       33 ms     (8.3x faster)
P3:        512 ms       65 ms     (7.9x faster)
P4:       2325 ms      579 ms     (4.0x faster)
P5:         17 ms       19 ms     (0.9x faster)
P6:        125 ms       28 ms     (4.5x faster)
P7:        925 ms      649 ms     (1.4x faster)
P8:        341 ms      332 ms     (1.0x faster)
M12_5:    1173 ms      996 ms     (1.2x faster)
M12_6:   37056 ms    15905 ms     (2.3x faster)
T1:        317 ms      297 ms     (1.1x faster)
T2:        350 ms      315 ms     (1.1x faster)
T3:      26561 ms     9832 ms     (2.7x faster)
H1:       4521 ms     3381 ms     (1.3x faster)
S7:        278 ms      260 ms     (1.1x faster)
S8:       4284 ms     3686 ms     (1.2x faster)

S9, S10 and H2 still take forever.

(Note: #937 also adds the new test polynomial C1 from #907 to the test suite. This takes 680 ms with the latest improvements, previously ~forever.)

fredrik-johansson commented 3 years ago

Correction: S9 doesn't take forever, it only takes 12 minutes!

To get a sense of what is going on, I put a printout in factor_van_hoeij to show each call to LLL and how long it takes:

S5:

LLL 17 x 17, 35 bits    : 0.00021 s
LLL 17 x 18, 35 bits    : 0.000162 s
LLL 17 x 19, -35 bits    : 0.000411 s
LLL 10 x 20, -35 bits    : 0.000158 s
LLL 8 x 21, -35 bits    : 0.000217 s

S6:

LLL 17 x 17, 35 bits    : 0.000198 s
LLL 17 x 18, -35 bits    : 0.000843 s
LLL 18 x 19, -35 bits    : 0.000288 s
LLL 18 x 20, -35 bits    : 0.001144 s
LLL 33 x 33, 54 bits    : 0.00122 s
LLL 33 x 34, -54 bits    : 0.001891 s
LLL 30 x 35, -54 bits    : 0.002212 s
LLL 31 x 36, -54 bits    : 0.001956 s
LLL 15 x 37, -54 bits    : 0.000606 s
LLL 5 x 38, -54 bits    : 0.00014 s

S7:

LLL 33 x 33, 54 bits    : 0.000697 s
LLL 33 x 34, -54 bits    : 0.001945 s
LLL 24 x 35, -54 bits    : 0.00034 s
LLL 24 x 36, -54 bits    : 0.002197 s
LLL 25 x 37, -54 bits    : 0.000594 s
LLL 25 x 38, -54 bits    : 0.002948 s
LLL 65 x 65, 103 bits    : 0.004933 s
LLL 65 x 66, -103 bits    : 0.031963 s
LLL 66 x 67, -103 bits    : 0.00867 s
LLL 62 x 68, -103 bits    : 0.01468 s
LLL 63 x 69, -103 bits    : 0.092595 s
LLL 31 x 70, -103 bits    : 0.069487 s
LLL 30 x 71, -103 bits    : 0.010149 s

S8:

LLL 65 x 65, 103 bits    : 0.004821 s
LLL 65 x 66, -103 bits    : 0.030986 s
LLL 66 x 67, -103 bits    : 0.050655 s
LLL 54 x 68, -103 bits    : 0.050977 s
LLL 55 x 69, -103 bits    : 0.053301 s
LLL 129 x 129, 200 bits    : 0.084153 s
LLL 129 x 130, -200 bits    : 0.084978 s
LLL 125 x 131, -200 bits    : 0.402601 s
LLL 126 x 132, -200 bits    : 0.132055 s
LLL 114 x 133, -200 bits    : 0.145725 s
LLL 115 x 134, -200 bits    : 1.60024 s
LLL 54 x 135, -200 bits    : 0.79608 s
LLL 54 x 136, -200 bits    : 0.181834 s

S9:

LLL 129 x 129, 200 bits    : 0.082886 s
LLL 129 x 130, -200 bits    : 0.21772 s
LLL 130 x 131, -200 bits    : 0.435493 s
LLL 114 x 132, -200 bits    : 0.759173 s
LLL 115 x 133, -200 bits    : 1.62082 s
LLL 116 x 134, -200 bits    : 1.38855 s
LLL 117 x 135, -200 bits    : 1.49151 s
LLL 257 x 257, 393 bits    : 0.670866 s
LLL 257 x 258, -393 bits    : 3.12472 s
LLL 252 x 259, -393 bits    : 3.86854 s
LLL 253 x 260, -393 bits    : 91.2537 s
LLL 237 x 261, -393 bits    : 1.93611 s
LLL 238 x 262, -393 bits    : 7.50094 s
LLL 198 x 263, -393 bits    : 303.488 s
LLL 199 x 264, -393 bits    : 335.692 s
LLL 86 x 265, -393 bits    : 2.05302 s

(This is with MPFR disabled.)

It looks like LLL is taking a long time to finish for some input.

wbhart commented 3 years ago

My best guess is that LLL should only be called if it is worth it. Flint1 had a function to determine if this was the case. I'm quite surprised this made so much difference if it was the real reason for the better performance there.

fredrik-johansson commented 3 years ago

Playing with LLL parameters makes some difference (up to a factor two):

Note that the defaults are delta = 0.99, eta = 0.51.

delta = 0.75, eta = 0.51

LLL 129 x 129, 200 bits    : 0.085197 s
LLL 129 x 130, -200 bits    : 0.118409 s
LLL 130 x 131, -200 bits    : 0.3499 s
LLL 114 x 132, -200 bits    : 0.415874 s
LLL 115 x 133, -200 bits    : 0.693953 s
LLL 116 x 134, -200 bits    : 0.915278 s
LLL 117 x 135, -200 bits    : 0.845167 s
LLL 257 x 257, 393 bits    : 0.692637 s
LLL 257 x 258, -393 bits    : 3.23055 s
LLL 252 x 259, -393 bits    : 1.78256 s
LLL 253 x 260, -393 bits    : 60.4798 s
LLL 237 x 261, -393 bits    : 1.25693 s
LLL 238 x 262, -393 bits    : 2.3563 s
LLL 239 x 263, -393 bits    : 240.758 s
LLL 240 x 264, -393 bits    : 253.835 s
LLL 127 x 265, -393 bits    : 3.27139 s

delta = 0.75, eta = 0.6

LLL 129 x 129, 200 bits    : 0.089295 s
LLL 129 x 130, -200 bits    : 0.113068 s
LLL 130 x 131, -200 bits    : 0.332855 s
LLL 114 x 132, -200 bits    : 0.284462 s
LLL 115 x 133, -200 bits    : 0.546179 s
LLL 116 x 134, -200 bits    : 0.611776 s
LLL 117 x 135, -200 bits    : 0.753995 s
LLL 118 x 136, -200 bits    : 0.67701 s
LLL 257 x 257, 393 bits    : 0.765197 s
LLL 257 x 258, -393 bits    : 3.47659 s
LLL 252 x 259, -393 bits    : 1.47646 s
LLL 253 x 260, -393 bits    : 56.906 s
LLL 237 x 261, -393 bits    : 0.821933 s
LLL 238 x 262, -393 bits    : 2.22039 s
LLL 239 x 263, -393 bits    : 182.26 s
LLL 240 x 264, -393 bits    : 196.611 s
LLL 127 x 265, -393 bits    : 2.63544 s

delta = 0.75, eta = 0.75

LLL 129 x 129, 200 bits    : 0.088646 s
LLL 129 x 130, -200 bits    : 0.081249 s
LLL 130 x 131, -200 bits    : 0.297023 s
LLL 114 x 132, -200 bits    : 0.178078 s
LLL 115 x 133, -200 bits    : 0.327666 s
LLL 116 x 134, -200 bits    : 0.454545 s
LLL 117 x 135, -200 bits    : 23.5608 s
LLL 117 x 136, -200 bits    : 0.535455 s
LLL 257 x 257, 393 bits    : 0.68268 s
LLL 257 x 258, -393 bits    : 3.2413 s
LLL 252 x 259, -393 bits    : 1.0691 s
LLL 253 x 260, -393 bits    : 42.3917 s
LLL 237 x 261, -393 bits    : 0.670725 s
LLL 238 x 262, -393 bits    : 1.11835 s
LLL 239 x 263, -393 bits    : 144.582 s
LLL 240 x 264, -393 bits    : 145.512 s
LLL 239 x 265, -393 bits    : 2.33037 s
LLL 240 x 266, -393 bits    : 5.09456 s

wbhart commented 3 years ago

Sure but one has to use parameters within the scope of Stehle's paper. Of course there's nothing stopping one doing a final reduction with other values, but this might be more expensive, as one then has to use a standard LLL reduction.

On Wed, 14 Apr 2021 at 09:36, Fredrik Johansson @.***> wrote:

Playing with LLL parameters makes some difference (up to a factor two):

Note that the defaults are delta = 0.99, eta = 0.51.

delta = 0.75, eta = 0.51

LLL 129 x 129, 200 bits : 0.085197 s LLL 129 x 130, -200 bits : 0.118409 s LLL 130 x 131, -200 bits : 0.3499 s LLL 114 x 132, -200 bits : 0.415874 s LLL 115 x 133, -200 bits : 0.693953 s LLL 116 x 134, -200 bits : 0.915278 s LLL 117 x 135, -200 bits : 0.845167 s LLL 257 x 257, 393 bits : 0.692637 s LLL 257 x 258, -393 bits : 3.23055 s LLL 252 x 259, -393 bits : 1.78256 s LLL 253 x 260, -393 bits : 60.4798 s LLL 237 x 261, -393 bits : 1.25693 s LLL 238 x 262, -393 bits : 2.3563 s LLL 239 x 263, -393 bits : 240.758 s LLL 240 x 264, -393 bits : 253.835 s LLL 127 x 265, -393 bits : 3.27139 s

delta = 0.75, eta = 0.6

LLL 129 x 129, 200 bits : 0.089295 s LLL 129 x 130, -200 bits : 0.113068 s LLL 130 x 131, -200 bits : 0.332855 s LLL 114 x 132, -200 bits : 0.284462 s LLL 115 x 133, -200 bits : 0.546179 s LLL 116 x 134, -200 bits : 0.611776 s LLL 117 x 135, -200 bits : 0.753995 s LLL 118 x 136, -200 bits : 0.67701 s LLL 257 x 257, 393 bits : 0.765197 s LLL 257 x 258, -393 bits : 3.47659 s LLL 252 x 259, -393 bits : 1.47646 s LLL 253 x 260, -393 bits : 56.906 s LLL 237 x 261, -393 bits : 0.821933 s LLL 238 x 262, -393 bits : 2.22039 s LLL 239 x 263, -393 bits : 182.26 s LLL 240 x 264, -393 bits : 196.611 s LLL 127 x 265, -393 bits : 2.63544 s

delta = 0.75, eta = 0.75

LLL 129 x 129, 200 bits : 0.088646 s LLL 129 x 130, -200 bits : 0.081249 s LLL 130 x 131, -200 bits : 0.297023 s LLL 114 x 132, -200 bits : 0.178078 s LLL 115 x 133, -200 bits : 0.327666 s LLL 116 x 134, -200 bits : 0.454545 s LLL 117 x 135, -200 bits : 23.5608 s LLL 117 x 136, -200 bits : 0.535455 s LLL 257 x 257, 393 bits : 0.68268 s LLL 257 x 258, -393 bits : 3.2413 s LLL 252 x 259, -393 bits : 1.0691 s LLL 253 x 260, -393 bits : 42.3917 s LLL 237 x 261, -393 bits : 0.670725 s LLL 238 x 262, -393 bits : 1.11835 s LLL 239 x 263, -393 bits : 144.582 s LLL 240 x 264, -393 bits : 145.512 s LLL 239 x 265, -393 bits : 2.33037 s LLL 240 x 266, -393 bits : 5.09456 s

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wbhart/flint2/issues/932#issuecomment-819300959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3HJU3ZAGTXXCOGBSN3ITTIVAZJANCNFSM42VEMOCA .

flintlib / flint

Slow polynomial factoring #932