The problem with the implementation is that it uses A[i][i] without checking if it's zero or not. If it happens to be zero, there will be division by zero. Trivially invertible matrices like [[0,1],[1,0]] thus result in NaN-values. I would suggest doing something like finding the row with the highest absolute value in that column,A[j], and adding it to A[i] so that A[i][i] == 1 before proceeding. I am not yet very familiar with all of the syntax for polymorphism in in Futhark, so here is an implementation for f32.
let gauss_jordan [m] [n] (A:[m][n]f32) =
loop A for i < i32.min m n do
let icol = map (\row -> row[i]) A
let (j,_) = map f32.abs icol
|> zip (iota m)
|> drop i
|> reduce_comm (\(i,a)(j,b) -> if a < b then (j,b) else (i,a)) (0,0)
let f = (1-A[i,i]) / A[j,i]
let irow = map2 (f32.fma f) A[j] A[i]
in map (\j -> if j == i
then irow
else let f = f32.negate A[j,i]
in map2 (f32.fma f) irow A[j]) (iota m)
Another problem, from a performance standpoint, is the use of the inverse to solve systems of equations. Pad the matrix with the vector instead. There should be a solver for square systems, that can then also be used for the least square solver. If you have a square system it makes no sense to do all that matrix multiplication.
The problem with the implementation is that it uses
A[i][i]
without checking if it's zero or not. If it happens to be zero, there will be division by zero. Trivially invertible matrices like[[0,1],[1,0]]
thus result in NaN-values. I would suggest doing something like finding the row with the highest absolute value in that column,A[j]
, and adding it toA[i]
so thatA[i][i] == 1
before proceeding. I am not yet very familiar with all of the syntax for polymorphism in in Futhark, so here is an implementation forf32
.Another problem, from a performance standpoint, is the use of the inverse to solve systems of equations. Pad the matrix with the vector instead. There should be a solver for square systems, that can then also be used for the least square solver. If you have a square system it makes no sense to do all that matrix multiplication.