NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
4.83k stars 833 forks source link

[QST] #1605

Open ganeshcolfax opened 4 days ago

ganeshcolfax commented 4 days ago

We see that for FP8 GEMM only TNN is supported in the cutlass_prolifer generated kernels and in the examples directory cutlass as well. Are there any fp8 kernel with other layouts like TTT/TTN shipped as reference? We are in need of such layouts for FP8 GEMM

thakkarV commented 4 days ago

They are supported already but maybe not getting stamped out by default. You can just add them in the cutlass library generator and they should show up in the profiler

ganeshcolfax commented 4 days ago

Thank you. It'll be a good learning for us to understand how to add a kernel in cutlass library generator. Pointers will be useful here. Thank you once again.

thakkarV commented 3 days ago

https://github.com/NVIDIA/cutlass/blob/main/python/cutlass_library/generator.py#L5634