ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
251 stars 102 forks source link

[CK_TILE][FA] using pk f16_f32 #1343

Closed carlushuang closed 2 weeks ago

carlushuang commented 2 weeks ago
  1. block_sync_lds() change to intrinsic
  2. fp32->fp16 support pk cvt (but we only enable in P->S conversion)