issues
search
ROCm
/
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
251
stars
102
forks
source link
[CK_TILE][FA] using pk f16_f32
#1343
Closed
carlushuang
closed
2 weeks ago
carlushuang
commented
2 weeks ago
block_sync_lds() change to intrinsic
fp32->fp16 support pk cvt (but we only enable in P->S conversion)