JuliaSmoothOptimizers / Krylov.jl

A Julia Basket of Hand-Picked Krylov Methods
Other
338 stars 51 forks source link

Question for GPU computation: lots of time on vector products #822

Closed yuwenchen95 closed 11 months ago

yuwenchen95 commented 11 months ago

When I try to solve different positive definite linear system using different methods,

using SparseArrays, LinearAlgebra
using Krylov
using CUDA
using StatProfilerHTML

n = 10000
density = 0.005
L = sprand(n,n,density)
A = L'*L + spdiagm(0=> rand(n))
b = rand(n)
Ag = CUSPARSE.CuSparseMatrixCSR(A)
bg = CuVector(b)

msolver = MinresSolver(Ag,bg)
mqsolver = MinresQlpSolver(Ag,bg)
csolver = CgSolver(Ag,bg)
@profilehtml begin
    minres!(msolver,Ag,bg)

    minres_qlp!(mqsolver,Ag,bg)

    cg!(csolver,Ag,bg)
end

I found lots of time are spent on some dot operations (specifically to the computation of α in each solver), which is counterintuitive to me since a CPU version spends most of computation time on mul! operations and the dot product time is negligible. Is this the difference between running indirect methods on CPUs and GPUs or it is potentially a bug?

amontoison commented 11 months ago

dot products are slow on GPU because it contains a "reduction" which means that even if we split the computation of the dot products at the end we need to synchronize all threads / cores to sum the components of the dot products.

mul! is the most expensive operation on GPU only if you have a very large problem (n=10000 if dense or n=100000 if sparse).

Note you can also test car and minares. They are new methods dedicated to symmetric (positive definite) systems.