ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
251 stars 102 forks source link

Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. #1354

Open andriy-ca opened 1 week ago

andriy-ca commented 1 week ago

Added grouped_conv_fwd_outelementop operation to ckProfiler.

The option enables performance profiling of 3D FWD convolutions on tensors with non-standard floating-point data types followed by scaling operation.

At this time, the following combinations of data types and operations can be profiled:

Support for profiling the following combinations is implemented, but CK currently does not instantiate corresponding instances:

Refer to grouped_convolution_forward_convscale.hpp and grouped_convolution_forward_convinvscale.hpp for all implementations that were instantiated.

Note. This PR includes changes proposed in #1326 .