JuliaORNL / JACC.jl

CPU/GPU parallel performance portable layer in Julia via functions as arguments
MIT License
21 stars 13 forks source link

Parallel reduce MN optimized for CUDA, AMDGPU, and oneAPI using multi… #24

Closed pedrovalerolara closed 9 months ago

pedrovalerolara commented 9 months ago

…-block and multi-SM implementations. Added test cases for new implementations.