Incorrect result when using CUDSS

ovanvincq commented 1 month ago

I tried using CUDSS.jl with a sparse matrix coming from an electromagnetism problem (using the FEM method)

The sparse matrix is stored in a JLD2 file that can be download here: ~~https://github.com/ovanvincq/test_cudss/blob/main/matrix.jld2~~ https://github.com/ovanvincq/test_cudss/raw/main/matrix.jld2

CUDSS (v0.3.1) fails to solve a linear system with this matrix.

When using the GPU (RTX3060Ti or GV100):

using JLD2, CUDSS, CUDA, CUDA.CUSPARSE

A_cpu=load("matrix.jld2","A")
T=eltype(A_cpu)
n=size(A_cpu,1)
x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)

F_gpu=lu(A_gpu)
x_gpu = F_gpu \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)

returns norm(r_gpu)=7E17

When using the CPU:

F=lu(A_cpu)
x_cpu = F \ b_cpu
r_cpu = b_cpu - A_cpu * x_cpu
norm(r_cpu)

returns norm(r_cpu)=9E-8

Am I doing something wrong?

amontoison commented 1 month ago

@ovanvincq I am unable to load the matrix when I tried to reproduce the issue. Can you save and upload the matrix in the MartrixMarket format?

julia> A_cpu = load("./matrix.jld2","A")
Error encountered while load FileIO.File{FileIO.DataFormat{:JLD2}, String}("./matrix.jld2").

Fatal error:
ERROR: JLD2.InvalidDataException("Did not find a Superblock.")
Stacktrace:
  [1] find_superblock(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/superblock.jl:102
  [2] load_file_metadata!(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:403
  [3] jldopen(fname::String, wr::Bool, create::Bool, truncate::Bool, iotype::Type{…}; fallback::Type{…}, compress::Bool, mmaparrays::Bool, typemap::Dict{…}, parallel_read::Bool)
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:392
  [4] jldopen
    @ ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:312 [inlined]
  [5] #jldopen#14
    @ ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:455 [inlined]
  [6] jldopen
    @ ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:449 [inlined]
  [7] fileio_load(f::FileIO.File{FileIO.DataFormat{:JLD2}, String}, varname::String; kwargs::@Kwargs{})
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/fileio.jl:54
  [8] fileio_load(f::FileIO.File{FileIO.DataFormat{:JLD2}, String}, varname::String)
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/fileio.jl:53
  [9] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [10] invokelatest
    @ ./essentials.jl:889 [inlined]
 [11] action(call::Symbol, libraries::Vector{Union{…}}, file::FileIO.Formatted, args::String; options::@Kwargs{})
    @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:219
 [12] action
    @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:196 [inlined]
 [13] action
    @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:185 [inlined]
 [14] load(file::String, args::String; options::@Kwargs{})
    @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:113
 [15] load(file::String, args::String)
    @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:109
 [16] top-level scope
    @ REPL[23]:1
Stacktrace:
 [1] handle_error(e::JLD2.InvalidDataException, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/error_handling.jl:61
 [2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector}}, action::String)
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/error_handling.jl:56
 [3] action(call::Symbol, libraries::Vector{Union{…}}, file::FileIO.Formatted, args::String; options::@Kwargs{})
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:228
 [4] action
   @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:196 [inlined]
 [5] action
   @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:185 [inlined]
 [6] load(file::String, args::String; options::@Kwargs{})
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:113
 [7] load(file::String, args::String)
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:109
 [8] top-level scope
   @ REPL[23]:1
Some type information was truncated. Use `show(err)` to see complete types.

ovanvincq commented 1 month ago

@amontoison Sorry, the link above was the link to file desciption and not to the raw jld2 file.

The correct link is : https://github.com/ovanvincq/test_cudss/raw/main/matrix.jld2

I also uploaded the matrix in the MatrixMarket format: https://github.com/ovanvincq/test_cudss/raw/main/matrix.mtx

amontoison commented 1 month ago

@ovanvincq I confirm that I can reproduce the issue. I will report this problem to the CUDSS developers and keep you updated.

amontoison commented 1 month ago

@ovanvincq I got some feedback from NVIDIA, and the LU factorization is not robust enough for your problem. The only way to solve it with CUDSS is to perform a scaling of A on the CPU before factorizing it on the GPU. This kind of preprocessing is done automatically in mature linear solvers on the CPU but not in cuDSS.

ovanvincq commented 1 month ago

@amontoison Thanks for this feedback. However, I rescaled the matrix with the scaling matrix given by LinearAlgebra.lu to obtain a correctly scaled matrix but it doesn't work:

using JLD2, CUDSS, CUDA, CUDA.CUSPARSE,LinearAlgebra,SparseArrays

A_cpu=load("matrix.jld2","A") 
### Matrix rescaling
F=lu(A_cpu)
A_cpu=(sparse(Diagonal(F.Rs))* A_cpu)
###
T=eltype(A_cpu)
n=size(A_cpu,1)
x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)

F_gpu=lu(A_gpu)
x_gpu = F_gpu \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)

gives norm(r_gpu)=4.703611480548198e8

exanauts / CUDSS.jl

Incorrect result when using CUDSS #52