Closed G-071 closed 2 years ago
This PR provides a performance improvement for the Kokkos reconstruct kernel. The automatic tiling was suboptimal here. Hence we now use manual tiles with 64 workitems each. This lifts the kernel to the performance of its CUDA counterpart.
This PR provides a performance improvement for the Kokkos reconstruct kernel. The automatic tiling was suboptimal here. Hence we now use manual tiles with 64 workitems each. This lifts the kernel to the performance of its CUDA counterpart.