Is your feature request related to a problem? Please describe.
There are many algorithms for gemms and convs that are designed specifically for TensorOps. For example, any of the algorithms that include "fusion" only have templates for TensorOps and use specific threadblock iterators that are also designed to interface specifically with TensorOps. For example, default_conv2d_wgrad_fusion.h only defines TensorOp templates and uses PredicatedScaleBiasVectorIterator which seems to have a memory access pattern to fit with how data is laid out across threads in a TensorOp. Is there any plan to also add support for using SIMT ops for these algorithms?
Describe the solution you'd like
Additional support for Simt Ops in kernels and iterators that are currently designed specifically for TensorOps.
Is your feature request related to a problem? Please describe. There are many algorithms for gemms and convs that are designed specifically for TensorOps. For example, any of the algorithms that include "fusion" only have templates for TensorOps and use specific threadblock iterators that are also designed to interface specifically with TensorOps. For example,
default_conv2d_wgrad_fusion.h
only defines TensorOp templates and usesPredicatedScaleBiasVectorIterator
which seems to have a memory access pattern to fit with how data is laid out across threads in a TensorOp. Is there any plan to also add support for using SIMT ops for these algorithms?Describe the solution you'd like Additional support for Simt Ops in kernels and iterators that are currently designed specifically for TensorOps.