NexGenAnalytics / kokkos-kernels

Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
0 stars 0 forks source link

task2: interface #3

Open mzuzek opened 2 years ago

mzuzek commented 2 years ago

Scope

Try to give common "feeling" to both interfaces 2a and 2b

2a. Execution Space

Blas kernels to cover:

The objective would be do things like:

auto exec = ExecSpace(); // instance

 auto A, B, C;

template<class ExecSpace, ... >
 KokkosBlas::gemm( exec, A, B, C );

2b. Parallelization level dispatch

Have parallelization level (serial, team and team-vector) as a parameter - like ArgMode in: https://github.com/kokkos/kokkos-kernels/blob/develop/src/batched/dense/KokkosBatched_Gemm_Decl.hpp#L98-L119 (inspiration):

Blas kernels to cover:

mzuzek commented 2 years ago

Parallel Contexts

Note: ThreadVectorRange can be also called like TeamThreadRange (not inside it) - and then works like TeamVectorRange (which is probably better choice - TODO learn what's the difference)

levels

See also HierarchicalParallelism 8.4 in Kokkos Wiki.

Kokkos hierarchy maps to hardware threads slightly different on CPU and GPU:

Kokkos level CPU GPU
Team thread group thread group (block)
Thread CPU thread thread group (warp)
Vector SIMD / intrinsics GPU thread