Nemocas / AbstractAlgebra.jl

Generic abstract algebra functionality in pure Julia (no C dependencies)
https://nemocas.github.io/AbstractAlgebra.jl/dev/index.html
Other
158 stars 60 forks source link

integer nullspace AbstractAlgebra.jl vs Nemo.jl : different scalings #855

Open maajdl opened 3 years ago

maajdl commented 3 years ago

Hello

This is maybe not an issue, but merely an issue for my usage, I don't know. It tested both packages on this matrix (a stoechiometry matrix):

m = 
[0 0 2 0 0 2 0 2; 
 0 0 2 0 0 0 2 0; 
 2 0 0 0 2 0 0 4; 
 0 0 0 2 0 0 0 0; 
 0 2 3 0 1 2 1 0]

The types of m for AbstractAlgebra.jl and Nemo.jl were respectively

fmpz_mat
AbstractAlgebra.Generic.MatSpaceElem{BigInt}

Here is the nullspace returned by Nemo.jl (transposed):

[ 2  1  0  0 -2  0   0   0;
  4  0  1  0 -2  0  -1  -1; 
  0  0 -1  0  0  1   1   0]

Here is the nullspace returned by AbstractAlgebra.jl (transposed):

[-32  -16    0   0  32   0   0    0; 
   0    0  -32   0   0  32  32    0; 
 -64   32    0   0   0 -32   0   32]

The two nullspaces are identical, but the basis differ. I see two differences.

The first difference I see, is in the scaling of the vectors. It looks like Nemo factors out a GCD and AbstractAlgebra doesn't. I could not understand in the source code why this is so.

The seconds difference amounts to a linear combinations. It happens that AbstractAlgebra provides the basis that I "prefer". This is because there are more zeros in the basis provided by AbstractAlgebra . From my point of view, this represents simpler chemical reactions.

Even if this might well not be an issue, could you give me some directions?

Thanks,

Michel

fieker commented 3 years ago

The two functions are doing completely different things

So you get a kernel over Q - scaled to be integral

in Nemo you get a kernel over Z...

so: AbstractAlgebra: integral basis (via scaling) for the nullspace as a matrix in Q (the field)

Nemo: a Z-basis for the kernel as a map from Z^? to Z^?

The Nemo basis is also a valid AbstractAlgebra answer, but not the other way round: the 1st row from AbstractAlgebra is -16 x the 1st row from Nemo, hence the Z-span is different.....

Do you want computations in Z or Q (mathematically)?

Greetings Claus On Sun, May 02, 2021 at 12:30:09PM -0700, maajdl wrote:

Hello

This is maybe not an issue, but merely an issue for my usage, I don't know. It tested both packages on this matrix (a stoechiometry matrix):

m = 
[0 0 2 0 0 2 0 2; 
 0 0 2 0 0 0 2 0; 
 2 0 0 0 2 0 0 4; 
 0 0 0 2 0 0 0 0; 
 0 2 3 0 1 2 1 0]

The types of m for AbstractAlgebra.jl and Nemo.jl were respectively

fmpz_mat
AbstractAlgebra.Generic.MatSpaceElem{BigInt}

Here is the nullspace returned by Nemo.jl (transposed):

[ 2  1  0  0 -2  0   0   0;
  4  0  1  0 -2  0  -1  -1; 
  0  0 -1  0  0  1   1   0]

Here is the nullspace returned by AbstractAlgebra.jl (transposed):

[-32  -16    0   0  32   0   0    0; 
   0    0  -32   0   0  32  32    0; 
 -64   32    0   0   0 -32   0   32]

The two nullspaces are identical, but the basis differ. I see two differences.

The first difference I see, is in the scaling of the vectors. It looks like Nemo factors out a GCD and AbstractAlgebra doesn't. I could not understand in the source code why this is so.

The seconds difference amounts to a linear combinations. It happens that AbstractAlgebra provides the basis that I "prefer". This is because there are more zeros in the basis provided by AbstractAlgebra . From my point of view, this represents simpler chemical reactions.

Even is this might well not be an issue, could you give me some directions?

Thanks,

Michel

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/Nemocas/AbstractAlgebra.jl/issues/855

fieker commented 3 years ago

Addon: what do you actually want to compute? Mathematically? All solutions in Z or in Q?

On Sun, May 02, 2021 at 12:30:09PM -0700, maajdl wrote:

Hello

This is maybe not an issue, but merely an issue for my usage, I don't know. It tested both packages on this matrix (a stoechiometry matrix):

m = 
[0 0 2 0 0 2 0 2; 
 0 0 2 0 0 0 2 0; 
 2 0 0 0 2 0 0 4; 
 0 0 0 2 0 0 0 0; 
 0 2 3 0 1 2 1 0]

The types of m for AbstractAlgebra.jl and Nemo.jl were respectively

fmpz_mat
AbstractAlgebra.Generic.MatSpaceElem{BigInt}

Here is the nullspace returned by Nemo.jl (transposed):

[ 2  1  0  0 -2  0   0   0;
  4  0  1  0 -2  0  -1  -1; 
  0  0 -1  0  0  1   1   0]

Here is the nullspace returned by AbstractAlgebra.jl (transposed):

[-32  -16    0   0  32   0   0    0; 
   0    0  -32   0   0  32  32    0; 
 -64   32    0   0   0 -32   0   32]

The two nullspaces are identical, but the basis differ. I see two differences.

The first difference I see, is in the scaling of the vectors. It looks like Nemo factors out a GCD and AbstractAlgebra doesn't. I could not understand in the source code why this is so.

The seconds difference amounts to a linear combinations. It happens that AbstractAlgebra provides the basis that I "prefer". This is because there are more zeros in the basis provided by AbstractAlgebra . From my point of view, this represents simpler chemical reactions.

Even is this might well not be an issue, could you give me some directions?

Thanks,

Michel

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/Nemocas/AbstractAlgebra.jl/issues/855

wbhart commented 3 years ago

What you've hit on is actually an issue, in that we have plans to fix it.

@fieker is right in that one function is computing the kernel over the ring you started with (Z in your case), the other is computing the nullspace over the fraction field (Q in your case). It is my experience that even many textbooks don't carefully distinguish the two or even note that they are different.

What we have done so far is to systematically introduce solve and solve_rational to distinguish between the two related concepts when solving matrix equations Ax = b (or xA = b, depending on whether you want left or right solutions). But we have not systematically introduced this naming convention in the case of nullspace. Indeed, even though we have functions can_solve_with_solution which return one solution, we currently have no functionality for returning all solutions, which is what nullspace and nullspace_rational must do, i.e. return the space of solutions to Mx = 0.

It's definitely going to be some time before we add this functionality, which we are especially sorry about given that you have a real world application. We'll try to help you out with a workaround in the meantime if you can let us know whether you want solutions over Z or Q (bearing in mind that there are more of them over one ring than the other).

wbhart commented 3 years ago

Based on my knowledge of chemistry (one year at undergrad level) and a few YouTube videos, I know nothing whatsoever about stoichiometry equations. But I think the question is going to boil down to whether the stoichiometry nullspace is supposed to yield numbers of atoms or moles of reagent. If it is numbers of atoms in the stoichiometry equation, then you obviously want the solutions over Z. But if it is just moles of reagent then I guess it is going to be over Q, since you can measure out fractional moles.

fredrik-johansson commented 3 years ago

I think the Q / Z discussion is orthogonal to the original problem. The question here is what form the output for the kernel over Q should take. Should we return an arbitrary basis, or put it in some kind of canonical form? A possible canonical form would be to compute the rref (of the vectors of rows, so actually the rcef if they are columns), and then remove the content from each vector.

In this case, we would get the following rows:

[4, 0, 0, 0, -2,  1,  0, -1]
[0, 2, 0, 0, -2, -1,  0,  1]
[0, 0, 1, 0,  0, -1, -1,  0]
wbhart commented 3 years ago

@fredrik-johansson That won't increase the number of zeroes in the matrix, so I am not sure it is completely orthogonal to the original question, though I guess it is part of the original question.

fredrik-johansson commented 3 years ago

Yes, in this case the rref canonical form is more dense than the AbstractAlgebra form, though the sparse output in this case seems to be by accident rather than by design? Generically, the rref will eliminate pivot elements so it should be (slightly) more sparse. We could add some separate function that computes a maximally sparse form of a vector space basis.

maajdl commented 3 years ago

Thanks to all of you for your interresting comments. I cannot answer all posts, and I also do not understand all of them in full detail.

Maybe it is useful that I explain a little bit more my aim. My initial aim was the elementary chemical equation balancing as a gadget in a thermochemistry package I am writing. But soon I realized that in general this amounts to finding a "space or reactions" not only one-dimensional. Thats was quite funny. Here is what I mean by a stoeachiometry matrix. If we consider this chemical reaction:

O2 +  C2H6 +  CaCO3 +  CH4  ==> H2O + CO2 +  CaO

(burning C2H6 and CH4 with O2 to decarbonate CaCO3 into CaO and CO2)

You notice that it is "unbalanced" because the number of each elements on both sides are different. For example: 5 Oxygen atoms on the left and 4 on the right. Balancing this equation means mutilplying each term by an integer to balance the number of each atoms on both sides. This leads to the follwoing stoechiometry matrix:

         O2     C2H6      H2O      CO2       N2      CaO    CaCO3       N2
C         0        4        0        2        0        0        2        2
Ca        0        0        0        0        0        2        2        0
H2        0        6        2        0        0        0        0        4
N2        0        0        0        0        2        0        0        0
O2        2        0        1        2        0        1        3        0

You may observe a factor 2 introduced to avoid half integers caused by the conventional "dimer forms" (like O2, N2, H2).

The space of balanced chemical reactions based on these 8 substances is indeed the nullspace of the stoechiometry matrix. It is desirable to find the "integer nullspace" because it is always possible to balance the equation with integer coefficients. It would also be nice-to-have the simplest possible basis reactions. This means as many zeros as possible in the matrix of the basis vectors of the nullspace.

Here is the (transposed) matrix of the nullspace basis returned by AbstractAlgebra:

                   O2     C2H6      H2O      CO2       N2      CaO    CaCO3       N2
reaction 1       -112      -32       96       64        0        0        0        0
reaction 2       -112      -32       96        0        0      -64       64        0
reaction 3         16      -32      -32        0        0        0        0       64

A simpler basis can be found by removing GCD for each vector:

                   O2     C2H6      H2O      CO2       N2      CaO    CaCO3       N2
reaction 1         -7       -2        6        4        0        0        0        0
reaction 2         -7       -2        6        0        0       -4        4        0
reaction 3          1       -2       -2        0        0        0        0        4

(this is closer to what Nemo would return) This corresponds to these 3 "basis chemical reactions":

 + 7 O2 + 2 C2H6  ==> + 6 H2O + 4 CO2 
 + 7 O2 + 2 C2H6 + 4 CaO  ==> + 6 H2O + 4 CaCO3 
 + 2 C2H6 + 2 H2O  ==> +  O2 + 4 CH4 

You may finally observe that a much simpler basis would be the following:

                   O2     C2H6      H2O      CO2       N2      CaO    CaCO3       N2
reaction 1         -7       -2        6        4        0        0        0        0
reaction 2          0        0        0       -1        0       -1        1        0
reaction 3          1       -2       -2        0        0        0        0        4

or in chemical language:

 7 O2 + 2 C2H6  ==> 6 H2O + 4 CO2 
 CO2 + CaO  ==>  CaCO3 
 2 C2H6 + 2 H2O  ==>  O2 + 4 CH4 

My aim is rather naïve. A 1 dimensional balancing is already a useful gadget for my package. A full basis is an interresting improvement, that I had not though of before I started coding. But finally note also that this question has probably real-world applications. Indeed, each possible reactions can be seen as a (so called) "pathway" in a complex process. And these pathways are interresting for themselves. Specially if they are not too exotic, I mean "complicated".

Best regards,

Michel

Postscriptum: If I multiply the stoechiometry matrix by a factor x, then the basis vectors returned by AbstractAlgebra are multiplied by . I guess this is because 5 is the number of rows in the stoechiometry matrix.

thofma commented 3 years ago

If you use the Nemo version (which computes the nullspace over Z), you will get a basis of the reactions, any other possible reaction is a linear combination of those.

You can call hnf or lll on your basis matrix of the nullspace (Nemo one). This should give you something with small coefficients (lll) or something canonical with smallish cofficients (hnf).

In your example

julia> hnf(S)
[2   1   0   0   -2    0    0   0]
[0   2   0   0   -2   -1    0   1]
[0   0   1   0    0   -1   -1   0]

julia> lll(S)
[0    0   -1   0    0   1   1    0]
[2   -1    0   0    0   1   0   -1]
[2    1    0   0   -2   0   0    0]

(where S is the basis matrix of the (transposed kernel you posted). There is not much difference between LLL and HNF because the matrix is rather small.

wbhart commented 3 years ago

Thanks for the explanation. I had wondered how the space of all equations would be useful given that surely each pathway occurs (in the presence of the required reagents) with some probability.

Anyhow, from my undergraduate chemistry I think I recognise some of those chemicals as the constituents of farts. This is a much more serious scientific problem now!

Anyhow, jokes aside, yes, you want the integer nullspace. I cannot guarantee this will be in AbstractAlgebra soon, but I will open a ticket for implementation of the "minimal" basis (not sure of the right terminology). I can't promise it will get implemented soon, but I'll link to this ticket from the issue so that if someone gets around to implementing it, they can ping you.

Our linear algebra still needs quite a lot of work!

wbhart commented 3 years ago

I will leave this ticket open as corresponding to the scaling issue only (which seems to affect the rational nullspace as implemented by AbstractAlgebra, if I got it the right way around), which should be fairly easy to fix.