Open mzuzek opened 2 years ago
TeamThreadRange
: called inside TeamPolicy
functor - maps to CPU threads and GPU warps.
ThreadVectorRange
: called inside TeamThreadRange
TeamVectorRange
: combines/flattens TeamThreadRange
and ThreadVectorRange
(implemented by https://github.com/kokkos/kokkos/issues/1227 for https://github.com/kokkos/kokkos/issues/713)
In the functor we can call Note:
ThreadVectorRange
can be also called likeTeamThreadRange
(not inside it) - and then works likeTeamVectorRange
(which is probably better choice - TODO learn what's the difference)
See also HierarchicalParallelism 8.4 in Kokkos Wiki.
Kokkos hierarchy maps to hardware threads slightly different on CPU and GPU:
Kokkos level | CPU | GPU |
---|---|---|
Team | thread group | thread group (block) |
Thread | CPU thread | thread group (warp) |
Vector | SIMD / intrinsics | GPU thread |
Scope
Try to give common "feeling" to both interfaces 2a and 2b
2a. Execution Space
Blas kernels to cover:
The objective would be do things like:
2b. Parallelization level dispatch
Have parallelization level (serial, team and team-vector) as a parameter - like
ArgMode
in: https://github.com/kokkos/kokkos-kernels/blob/develop/src/batched/dense/KokkosBatched_Gemm_Decl.hpp#L98-L119 (inspiration):TeamThreadRange
) and useTeamThreadRange
+ThreadVectorRange
combination;ThreadVectorRange
only and be callable from insideTeamThreadRange
? Should NOT useTeamVectorRange
!Blas kernels to cover: