Closed oscarbenjamin closed 1 month ago
@GiacomoPope @Jake-Moss does either of you want to review this?
The main change here is that fmpq_factor
returns primitive factors rather than monic factors e.g.:
In [7]: from flint import *
In [8]: x = fmpq_poly([0, 1])
In [9]: p = (x + fmpq(1,2))*(x - fmpq(1,4))/7
In [10]: p
Out[10]: 1/7*x^2 + 1/28*x + (-1/56)
In [11]: p.factor()
Out[11]: (1/56, [(4*x + (-1), 1), (2*x + 1, 1)])
This makes fmpq_poly
consistent with fmpq_mpoly
. Flint itself does not seem have an fmpq_poly_factor
function so fmpq_poly.factor()
uses fmpz_poly_factor
and then on master it makes the factors monic. Here I've changed that to keep the factors primitive as returned by fmpz_poly_factor
.
I also found some other issues:
fmpz_mod_poly.factor()
crashes on the zero poly. This is documented for fmpz_mod_poly_factor
:
Factorises a non-constant polynomial ...
There were inconsistent types in some places e.g. nmod_poly.leading_coefficient()
returns int
rather than nmod
. I fixed that here.
The mpoly.factor()
methods return fmpz
as the multiplicity of a factor unlike other poly types that use int
. I have fixed this for fmpz_mpoly
and fmpq_mpoly
but it might also need to be fixed for the new mpoly types in gh-164.
The mpoly types need to have a leading_coefficient
method. I've added this but again it might be needed for new types in gh-164. It is also missing for some univariate types.
For fq_default
and fq_default_poly
there is no way to get the associated context from an instance e.g.:
In [6]: p = fq_default_ctx(163)
In [7]: x = p(1)
In [8]: x.ctx
---------------------------------------------------------------------------
AttributeError
I don't see any way to get the context, characteristic, etc from x
. Same for fq_default_poly
.
The modulus()
has a different meaning for fq_default
than for nmod
and fmpz_mod
. We probably need to add a .characteristic()
method that could have the same meaning in all cases (at least the fmpz_mod
context object could have this method). It also makes sense for fq_default_ctx
to define .is_prime()
just like fmpz_mod_ctx
.
Also I have emphasised this a lot but let me just point out again that a carefully designed test like the test_factor_poly_mpoly
test here can pick up on a lot of issues including ones that have been in the codebase for a long time. The test simultaneously tests all poly types and all mpoly types together and just covers a few simple and corner cases but it checks all types comprehensively and checks all outputs thoroughly. It also demonstrates clearly what the expected behaviour is supposed to be for each of the different types and where they are similar and where they are different. If we only write separate tests for different types or for univariate vs multivariate etc then we don't get the benefits of this.
There were inconsistent types in some places e.g.
nmod_poly.leading_coefficient()
returnsint
rather thannmod
. I fixed that here.
Is there a particular reason for this? Does something depend on this behaviour? It seems like an unnecessary conversion to me, any use with another python-flint
object will result in a conversation back to a fmpz
or similar almost immediately.
The
mpoly.factor()
methods returnfmpz
as the multiplicity of a factor unlike other poly types that useint
. I have fixed this forfmpz_mpoly
andfmpq_mpoly
but it might also need to be fixed for the new mpoly types in gh-164.
Yes they will, I'll add those pending the above
There were inconsistent types in some places e.g.
nmod_poly.leading_coefficient()
returnsint
rather thannmod
. I fixed that here.Is there a particular reason for this? Does something depend on this behaviour? It seems like an unnecessary conversion to me, any use with another
python-flint
object will result in a conversation back to afmpz
or similar almost immediately.
It definitely matters whether we return an int
or an nmod
because they have different behaviour. For example a common thing that you might want to do with a leading coefficient is to invert it:
In [1]: from flint import *
In [2]: lc_int = 3
In [3]: lc_nmod = nmod(3, 7)
In [4]: 1/lc_int
Out[4]: 0.3333333333333333
In [5]: 1/lc_nmod
Out[5]: 5
As for the unnecessary conversion there needs to be a conversion either way. The Flint routine returns a ulong
and we need to turn that into a Python object whether that is PyLong
or nmod
. Cython handles it implicitly if you just return a ulong
but it still needs to create an actual PyLong
including allocating space on the heap for it.
One small difference actually is that CPython interns small integers (up to 255?) to avoid a heap allocation. We could do something similar with nmod
if we had a context object to store the interned values for a given modulus. At least for small enough moduli it would make sense to do that when we can intern every possible value. It could be handled transparently by a context method like ctx.nmod_from_ulong()
.
It definitely matters whether we return an
int
or annmod
because they have different behaviour.
I just found a similar issue with nmod_mpoly
and fmpz_mod_mpoly
. The coefficients should be nmod
and fmpz_mod
but from gh-164 they are int
and fmpz
:
In [7]: ctx = fmpz_mod_mpoly_ctx(2, Ordering.lex, ["x", "y"], 11)
In [8]: x = ctx.gen(0)
In [9]: x.coeffs()
Out[9]: [1]
In [10]: type(x.coeffs()[0])
Out[10]: flint.types.fmpz.fmpz
In [11]: ctx = nmod_mpoly_ctx(2, Ordering.lex, ["x", "y"], 11)
In [12]: x = ctx.gen(0)
In [13]: type(x.coeffs()[0])
Out[13]: int
Likewise leading_coefficient
needs to return nmod/fmpz_mod
and also the constant term in the factor
method.
Probably there are also inconsistencies in the factor_squarefree
methods as well...
Sorry @Jake-Moss, you reviewed this and then I made it 3x bigger. If you are still up for reviewing then please do...
The changes here are:
fmpq_poly
returns primitive rather than monic factors which is consistent with fmpq_mpoly
and SymPy and other things.sqrt
to all scalar types so all scalar and polynomial types have sqrt.factor_squarefree
to all poly types that did not have it so that all poly/mpoly types can be assumed to have it.lead_coefficient
to all poly types and now all poly/mpoly types can be assumed to have it.What I have not fixed here is the type pf the coefficients for nmod_poly
and fmpz_mod_poly
which are int
and fmpz
when they should be nmod
and fmpz_mod
. That is something that needs to be fixed before next release,
No worries, will do tonight
There were inconsistent types in some places e.g.
nmod_poly.leading_coefficient()
returnsint
rather thannmod
. I fixed that here.Is there a particular reason for this? Does something depend on this behaviour? It seems like an unnecessary conversion to me, any use with another
python-flint
object will result in a conversation back to afmpz
or similar almost immediately.It definitely matters whether we return an
int
or annmod
because they have different behaviour. For example a common thing that you might want to do with a leading coefficient is to invert it:In [1]: from flint import * In [2]: lc_int = 3 In [3]: lc_nmod = nmod(3, 7) In [4]: 1/lc_int Out[4]: 0.3333333333333333 In [5]: 1/lc_nmod Out[5]: 5
As for the unnecessary conversion there needs to be a conversion either way. The Flint routine returns a
ulong
and we need to turn that into a Python object whether that isPyLong
ornmod
. Cython handles it implicitly if you just return aulong
but it still needs to create an actualPyLong
including allocating space on the heap for it.One small difference actually is that CPython interns small integers (up to 255?) to avoid a heap allocation. We could do something similar with
nmod
if we had a context object to store the interned values for a given modulus. At least for small enough moduli it would make sense to do that when we can intern every possible value. It could be handled transparently by a context method likectx.nmod_from_ulong()
.
I'm sorry I quoted the wrong section of your message. The above makes complete sense but I was look at this change where the exponent is created as an fmpz, then immediately converted to an int.
The mpoly.factor() methods return fmpz as the multiplicity of a factor unlike other poly types that use int. I have fixed this for fmpz_mpoly and fmpq_mpoly but it might also need to be fixed for the new mpoly types in https://github.com/flintlib/python-flint/pull/164.
I was look at this change where the exponent is created as an fmpz, then immediately converted to an int.
The mpoly.factor() methods return fmpz as the multiplicity of a factor unlike other poly types that use int.
It's important that the types are consistent and we don't get into a situation where we don't know what they are supposed to be any more. If it is supposed to be an int
then it needs to be an int
. I recently ran into a situation where having an fmpz in a place that would usually be an int caused a bug:
https://github.com/python/cpython/issues/120950
I'm a bit surprised that an fmpz is used at all for the multiplicity in the factor struct. It suggests great ambition about the sizes of polynomials that can be factorised.
The nmod_poly
code just casts the exponent to an integer which in principle would be incorrect if the multiplicity is greater than 2^(FLINT_BITS-2)
:
https://github.com/flintlib/python-flint/blob/464a27650b3eb6df06e9d3bb660cdc494b9efcf5/src/flint/types/nmod_poly.pyx#L640-L641
On a 64 bit system it is inconceivable that an nmod_poly
could have a factor of multiplicity greater than 2^62
. On a 32 bit system could it have greater than 2^30
? Maybe at a push for GF(p)
with very small p
...
I suppose you can have something like:
In [14]: from flint import *
In [15]: ctx = fmpz_mpoly_ctx(2, Ordering.lex, ["x", "y"])
In [16]: x, y = ctx.gens()
In [17]: ((x*y)**(2**64)).factor()
Out[17]: (1, [(x, 18446744073709551616), (y, 18446744073709551616)])
It's hard to imagine that a nontrivial polynomial could be factorised and have a multiplicity that does not fit slong/ulong.
For dense polynomials it is perhaps impossible:
In [18]: p = nmod_poly(0, 11)
In [19]: p[2**32] = 1
Flint exception (General error):
Unable to allocate memory (34359738376).
Aborted (core dumped)
(Ideally we would have a way of preventing that core dump...)
In any case though we need the types to be consistent. The multiplicity is never going to be large enough to need to be anything other than an int
and all other types already return int
. Factorisation is an extremely expensive operation so I doubt very much that the cost of converting the exponents is significant.
In any case though we need the types to be consistent. The multiplicity is never going to be large enough to need to be anything other than an
int
and all other types already returnint
. Factorisation is an extremely expensive operation so I doubt very much that the cost of converting the exponents is significant.
Thanks make sense, more than reasonable to have that.
I see that gh-190 is waiting on this so I'll merge once tests pass.
Thanks for the reviews!
See https://github.com/flintlib/python-flint/pull/164#discussion_r1720795664