fnrizzi / kokkos-kernels

Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
0 stars 0 forks source link

Draft API and specialization graph of the block sparse mv kernel #6

Open fnrizzi opened 3 years ago

fnrizzi commented 3 years ago

- how do we want to name the kernel? bspmv or blockspmv? - need to figure out if there already exits the public API for the bspmv

uhetmaniuk commented 3 years ago

The block size is part of the class BlockCrsMatrix. It is an internal variable.

fnrizzi commented 3 years ago

@uhetmaniuk yes, but also the rank of the vectors is part of the view class. For instance, in the spmv kernel , if the view passed in is a 2d view but contains a single column, the impl redirects that to calling the impl for 1d views.

So in mind, maybe it would make sense to also consider the block size and specialize on that. If the block is really small, maybe one should use a different impl than when the block is too large? Just thinking about loud. The block size would be something we check at run-time to specialize. For example, I believe that we should if the block size == 1, which is a speical case, and thus we should call the regular spmv impl. All the other kenerls do similar things, so I assume we should think about this details as well.

@MikolajZuzek take away from the KK meeting: Ulrich asked, and we should keep a single API for spmv and so detect at compile time if the matrix passed is block crs or not, and then specialize.

uhetmaniuk commented 3 years ago

@fnrizzi The latest implementation in the PR #1 is using the block size of the BlockCrsMatrix for the single vector case. but the front interface is "almost" identical to the existing spmv.

For the moment, I have removed the matrix-type templating (for simplicity). Once we have detected the BlockCrsMatrix class, my approach is that everything inside (and not visible to the user) is "fair game". So in particular, we can template on the block size for the routines called inside spmv. (see for instance )

If you look at the current spmv routine, it has a series of calls inside the implementation before reaching the actual computing routine.

fnrizzi commented 3 years ago

Yeah, I think we are talking about the same thing. All I meant is that I believe the specialization graph of the block spmv impl can potentially be more complex because we might want to also branch out inside depending on the block size. But yes of course, all of that happens inside the impl namespace so is totally hidden to the user.