Direct sparse solver on GPU

Hi everyone,

recentyl I have read the following paper about direct solver fro sparse linear systems on GPUs -- https://arxiv.org/pdf/2306.14337 . With a help of Marcel Koch I created the following code

auto exec = gko::CudaExecutor::create( 0, gko::OmpExecutor::create() );
   auto gko_A = gko::share( gko::matrix::Csr< double, int >::create(
      exec,
      gko::dim< 2 >{ static_cast< std::size_t >( matrix_Size ), static_cast< std::size_t >matrix_Size ) },
      gko::make_array_view( exec, matrix_NonzeroElementsCount, matrix_Values ),
      gko::make_array_view( exec, matrix_NonzeroElementsCount, matrix_ColumnIndexes ),
      gko::make_array_view( exec, matrix_Rows + 1, matrix_RowPtrs ) ) );

   auto gko_x = gko::matrix::Dense< double >::create( exec,
                                                      gko::dim< 2 >{ static_cast< std::size_t >( matrix_Size() ), 1 },
                                                      gko::make_array_view( exec, matrix_Size, x_Data ),
                                                      1 );

   auto gko_b = gko::matrix::Dense< double >::create( exec,
                                                      gko::dim< 2 >{ static_cast< std::size_t >( matrix_Size ), 1 },
                                                      gko::make_array_view( exec, matrix_Size, b_Data ),
                                                      1 );

   auto gk_solver = gko::experimental::solver::Direct< double, int >::build()
                       .with_factorization( gko::experimental::factorization::Lu< double, int >::build() )
                       .on( exec )
                       ->generate( gko_A );
   gk_solver->apply( gko_b, gko_x );
   std::cout << "b = " << b << std::endl;
   std::cout << "x = " << x << std::endl;

This seems to work well on CUDA GPUs. I have two questions now:

The paper says that the solver runs in three phases - symbolic factorization (on CPU), numerical factorization and triangular solvers. If the matrix pattern does not change, I would like to call the symbolic factorization just once and then I would like to be able to call explicitly numerical factorization and triangular solve. Is it possible? I could not find such methods in the source code.
Could I run the second and the third phases concurrently on the GPU using various CUDA streams?

Thanks a lot, Tomas.

ginkgo-project / ginkgo

Direct sparse solver on GPU #1636