Incorrect result when using CUDSS #52

Open ovanvincq opened 1 month ago

ovanvincq commented 1 month ago

I tried using CUDSS.jl with a sparse matrix coming from an electromagnetism problem (using the FEM method)

The sparse matrix is stored in a JLD2 file that can be download here: https://github.com/ovanvincq/test_cudss/blob/main/matrix.jld2 https://github.com/ovanvincq/test_cudss/raw/main/matrix.jld2

CUDSS (v0.3.1) fails to solve a linear system with this matrix.

When using the GPU (RTX3060Ti or GV100):


x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)

x_gpu = F_gpu \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu

returns norm(r_gpu)=7E17

When using the CPU:

x_cpu = F \ b_cpu
r_cpu = b_cpu - A_cpu * x_cpu

returns norm(r_cpu)=9E-8

Am I doing something wrong?

amontoison commented 1 month ago

@ovanvincq I am unable to load the matrix when I tried to reproduce the issue. Can you save and upload the matrix in the MartrixMarket format?

ovanvincq commented 1 month ago

@amontoison Sorry, the link above was the link to file desciption and not to the raw jld2 file.

The correct link is : https://github.com/ovanvincq/test_cudss/raw/main/matrix.jld2

I also uploaded the matrix in the MatrixMarket format: https://github.com/ovanvincq/test_cudss/raw/main/matrix.mtx

amontoison commented 1 month ago

@ovanvincq I confirm that I can reproduce the issue. I will report this problem to the CUDSS developers and keep you updated.

amontoison commented 1 month ago

@ovanvincq I got some feedback from NVIDIA, and the LU factorization is not robust enough for your problem. The only way to solve it with CUDSS is to perform a scaling of A on the CPU before factorizing it on the GPU. This kind of preprocessing is done automatically in mature linear solvers on the CPU but not in cuDSS.

ovanvincq commented 1 month ago

@amontoison Thanks for this feedback. However, I rescaled the matrix with the scaling matrix given by LinearAlgebra.lu to obtain a correctly scaled matrix but it doesn't work:

using JLD2, CUDSS, CUDA, CUDA.CUSPARSE,LinearAlgebra,SparseArrays

### Matrix rescaling
A_cpu=(sparse(Diagonal(F.Rs))* A_cpu)
x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)

x_gpu = F_gpu \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu

gives norm(r_gpu)=4.703611480548198e8