ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
251 stars 102 forks source link

Universal gemm splitk using reduce (with multi-d) #1341

Open ltqin opened 2 weeks ago

aska-0096 commented 18 hours ago

@ltqin could you do some performance benchmark, compare the performance with origin gemm_universal