elixir-nx / nx

Multi-dimensional arrays (tensors) and numerical definitions for Elixir
2.55k stars 190 forks source link

`Nx.LinAlg.eigh` inconsistency #1505

Closed samuelmanzanera closed 3 weeks ago

samuelmanzanera commented 1 month ago

Hello

I'm having some issue regarding the function Nx.LinAlg.eigh on Mac OSX Ventura, using Nx 0.7.2 with Elixir 1.14.1, Erlang OTP25. It seems there is inconsistency between Linux and OSX hosts

I was taking the example of Numpy: https://numpy.org/doc/stable/reference/generated/numpy.linalg.eigh.html

>>> from numpy import linalg as LA
>>> a = np.array([[1, -2j], [2j, 5]])
>>> eigenvalues, eigenvectors = LA.eigh(a)
>>> eigenvalues
array([0.17157288, 5.82842712])
>>> eigenvectors
array([[-0.92387953+0.j        , -0.38268343+0.j        ], # may vary
       [ 0.        +0.38268343j,  0.        -0.92387953j]])

On OSX

iex> Nx.tensor([[1, -2], [2, 5]]) |> Nx.LinAlg.eigh
{#Nx.Tensor<
   f32[2]
   EXLA.Backend<host:0, 0.2708889488.3678011400.648>
   [3.190253973007202, 2.8097469806671143]
 >,
 #Nx.Tensor<
   f32[2][2]
   EXLA.Backend<host:0, 0.2708889488.3678011400.649>
   [
     [-0.672633945941925, -0.7399753332138062],
     [0.7399753332138062, -0.672633945941925]
   ]
 >}

On Linux

iex> Nx.tensor([[1, -2], [2, 5]]) |> Nx.LinAlg.eigh
{#Nx.Tensor<
   f32[2]
   EXLA.Backend<host:0, 0.2014566346.990773251.89859>
   [0.17157292366027832, 5.828427314758301]
 >,
 #Nx.Tensor<
   f32[2][2]
   EXLA.Backend<host:0, 0.2014566346.990773251.89860>
   [
     [0.9238795042037964, 0.3826833963394165],
     [-0.3826833963394165, 0.9238795042037964]
   ]
 >}

We can see the Nx implementation is not accurate depending on the OS.

samuelmanzanera commented 1 month ago

I also noticed using a Rust's algreba library the results are the same as numpy but not quite the same from Nx

use nalgebra::DMatrix;

fn main() {
    let eigen = DMatrix::from_vec(2, 2, vec![1.0, -2.0, 2.0, 5.0]).symmetric_eigen();

    println!(
        "Eigen values {:?}",
        eigen.eigenvalues.iter().collect::<Vec<_>>()
    );
    println!(
        "Eigen vectors {:?}",
        eigen.eigenvectors.iter().collect::<Vec<_>>()
    );
}
// Output
// Eigen values [5.828427124746191, 0.17157287525380926]
// Eigen vectors [0.3826834323650899, -0.9238795325112866, -0.9238795325112866, -0.3826834323650899]

Nx seems to miss (-) sign for some eigen vectors

josevalim commented 1 month ago

@samuelmanzanera generally speaking, the results of eigh is not guaranteed to be exact across OSes, CPU/GPUs, etc. It is guaranteed that they will obey the properties of eigen values and vectors though.

polvalente commented 1 month ago

Hi @samuelmanzanera! There are few things we need to keep in mind when working with eigh and its results:

Firstly, notice that you're using different matrices betwen Nx and Numpy (2 and -2 instead of 2j and -2j). This means that the Numpy matrix is hermitian, while the one in Nx is not. This in turn becomes a "garbage-in garbage-out" situation.

Secondly, keep in mind that if v is an eigenvector of A, so is -v, as when they're passed through the matrix, they'll still obey the eigen equation A.v = lambda.v. This means that if you're getting the same results aside from the sign, the results are still mathematically correct.

samuelmanzanera commented 1 month ago

@samuelmanzanera generally speaking, the results of eigh is not guaranteed to be exact across OSes, CPU/GPUs, etc. It is guaranteed that they will obey the properties of eigen values and vectors though.

Does it mean the calculation of top of eigh can be wrong across devices ? Does it mean we should not use eigh across OS or CPU ?

samuelmanzanera commented 1 month ago

Hi @samuelmanzanera! There are few things we need to keep in mind when working with eigh and its results:

Firstly, notice that you're using different matrices betwen Nx and Numpy (2 and -2 instead of 2j and -2j).

This means that the Numpy matrix is hermitian, while the one in Nx is not. This in turn becomes a "garbage-in garbage-out" situation.

Secondly, keep in mind that if v is an eigenvector of A, so is -v, as when they're passed through the matrix, they'll still obey the eigen equation A.v = lambda.v. This means that if you're getting the same results aside from the sign, the results are still mathematically correct.

If the results are different except the sign, will the chain of calculations on a given matrix be wrong along the way ?

polvalente commented 1 month ago

If the results are different except the sign, will the chain of calculations on a given matrix be wrong along the way ?

They will be different, but not wrong, because calculations on eigenvectors shouldn't depend on their sign.

polvalente commented 1 month ago

@samuelmanzanera generally speaking, the results of eigh is not guaranteed to be exact across OSes, CPU/GPUs, etc. It is guaranteed that they will obey the properties of eigen values and vectors though.

Does it mean the calculation of top of eigh can be wrong across devices ? Does it mean we should not use eigh across OS or CPU ?

José was just referring to minor rounding error differences, and maybe sign differences as well. The calculation is safe to be used provided that its operating on sane inputs.

samuelmanzanera commented 1 month ago

@samuelmanzanera generally speaking, the results of eigh is not guaranteed to be exact across OSes, CPU/GPUs, etc. It is guaranteed that they will obey the properties of eigen values and vectors though.

Does it mean the calculation of top of eigh can be wrong across devices ? Does it mean we should not use eigh across OS or CPU ?

José was just referring to minor rounding error differences, and maybe sign differences as well. The calculation is safe to be used provided that its operating on sane inputs.

Are you agree there is such a big difference in the eigen vectors Mac vs Linux:

# Mac
[
  [-0.672633945941925, -0.7399753332138062],
  [0.7399753332138062, -0.672633945941925]
]

# Linux
[
  [0.9238795042037964, 0.3826833963394165],
  [-0.3826833963394165, 0.9238795042037964]
]
polvalente commented 1 month ago

I do agree, but those results don't really matter. The input is not a valid input for the function because it's not a hermitian matrix. Any results coming from that are effectively wrong anyway.

Try running with the following tensor: Nx.tensor([[1, Complex.new(0, -2)], [Complex.new(0, 2), 5]])

This is the actual input to replicate the Python call.

samuelmanzanera commented 3 weeks ago

Ok Thanks :)