ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
297 stars 113 forks source link

Add instances of grouped convolution 3d forward with a ConvScale element-wise op for bf8@bf8->fp8 #1326

Closed andriy-ca closed 3 months ago

andriy-ca commented 3 months ago

We are adding more instances of grouped convolution 3d forward with a ConvScale element-wise operation. This PR handles bf8@bf8->fp8 data types combination.