NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.66k stars 971 forks source link

[FEA] Add Support for SIMT Ops in GEMM and CONV to Match Current Tensor Op Support #725

Closed aadulla closed 1 year ago

aadulla commented 1 year ago

Is your feature request related to a problem? Please describe. There are many algorithms for gemms and convs that are designed specifically for TensorOps. For example, any of the algorithms that include "fusion" only have templates for TensorOps and use specific threadblock iterators that are also designed to interface specifically with TensorOps. For example, default_conv2d_wgrad_fusion.h only defines TensorOp templates and uses PredicatedScaleBiasVectorIterator which seems to have a memory access pattern to fit with how data is laid out across threads in a TensorOp. Is there any plan to also add support for using SIMT ops for these algorithms?

Describe the solution you'd like Additional support for Simt Ops in kernels and iterators that are currently designed specifically for TensorOps.

hwu36 commented 1 year ago

Sorry, we don't have plan to support simt to run on old architectures. These fusion operation is unlikely to get much speedup with simt anyway.

aadulla commented 1 year ago

Understood, thanks for the insight!