Figure out the specialization of template function `spmv`

fnrizzi / kokkos-kernels

Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels

Other

0 stars 0 forks source link

Figure out the specialization of template function `spmv` #2

Closed fnrizzi closed 3 years ago

fnrizzi commented 3 years ago

@uhetmaniuk I was speaking with @MikolajZuzek and was wondering about when various specializations are triggered? Did you already figure this out for the spmv? I think that concurrently with writing an implementation for the block spmv, we also need a tentative plan/design for when to select the various impls. For example, Luc said that even if we might be able to use some alreayd avail CUDA library, he wants a basic implementation using CUDA that does not rely on external libraries. So one question is when are the various specializations activated? what conditions?

uhetmaniuk commented 3 years ago

@fnrizzi @MikolajZuzek

spmv allows to compute y <- beta y + alpha A^{M} x where M is N, C, T, H (similar to *gemv) Focusing on the case N and C, the computational routine is spmv_beta_no_transpose in the file KokkosSparse_spmv_impl.hpp. There is a switch by template between the different Kokkos Kernels (Serial, OpenMP, GPU). (And the same thing applies to the transpose / hermitian case). If CuSparse is enabled, there is also a switch in KokkosSparse_spmv.hpp. In the branch for the PR #1 , I have been looking at the no-transpose case and added the template-based switch.

fnrizzi commented 3 years ago

@uhetmaniuk ok, yes thanks! that is exactly what i was referring to.

so if CuSparse is enabled, then cusparse is used to do the computation, right?
I looked at PR #1 and saw you are adding code, is that the code copied from Tpetra?

I noticed that in the body of the spmv, you have used the internal representation of the operators x, y, for example:


typedef Kokkos::View<
  typename XVector::const_value_type*,
  typename KokkosKernels::Impl::GetUnifiedLayout<XVector>::array_layout,
  typename XVector::device_type,
  Kokkos::MemoryTraits<Kokkos::Unmanaged|Kokkos::RandomAccess> > XVector_Internal;

XVector_Internal x_i = x;


do you why this has to be done? @MikolajZuzek offered to look into some of details to understand why of some things.

Just to clarify, I know that is work in progress, so I am not pressuring anything. Just asking questions :)

uhetmaniuk commented 3 years ago

Even if CuSparse is enabled, you can still decide to use a different computation (see line 155 of KokkosSparse_spmv.hpp).
The code in PR #1 is based on the file examples/wiki/sparse/KokkosSparse_wiki_spmv.cpp. I have gone through several iterations. Right now I am trying to follow the steps in spmv_beta_no_transpose (line 316 of KokkosSparse_spmv_impl.hpp). I have been comparing timings for the serial case.
I have tried to follow the beginning of spmv (Line 69 of file KokkosSparse_spmv.hpp). I do not know why it is necessary..

mzuzek commented 3 years ago

@fnrizzi @uhetmaniuk

I looked over KokkosSparse::spmv() implementation structure and specializations, here's the overview of calls going from user interface (orange) down to raw implementations (cyan): spmv uml (editable version: kk-spmv.plantuml.txt)

Summary:

KokkosKernels selects the implementation based on problem/input type decomposition:
- single vs multi-vector;
- modes: NoTranspose, Conjugate, Transpose, ConjugateTranspose;
- alpha / beta = 0/1/-1/other;
KokkosKernels specializes for TPL implementations (cuSPARSE and MKL) but not for Kokkos backends (OpenMP, CUDA, etc.) in general (that's left to Kokkos). In exception to that, there is OpenMP implementation in spmv_raw_openmp_no_transpose().
As mentioned by @uhetmaniuk: CuSPARSE can be disabled in runtime (via control param), it also gets omitted if selected mode (e.g. Conjugate) is not supported by current (old) library version (see _src/sparse/KokkosSparsespmv.hpp:155-L177)

fnrizzi commented 3 years ago

@MikolajZuzek this looks great!