Closed zhink closed 2 weeks ago
@zhink Thanks for your interest in rewriting the kernel. Could you provide more details on why you want to rewrite it and what function you want to implement?
Beacuse the project is not easy to reference cutlass3;intput is fp8 e4m3 or e5m2,output is bf16 of float16. If cutlass is not necessary, please provide guidance on how to rewrite it.
@zhink I think this kernel does not use cutlass.
but use cutlass::NumericArrayConverter and Converter::convert
You can implement numericArrayConverter yourself with reference to cutlass, which is not complicated.
In this kernel cpp/tensorrt_llm/kernels/weightOnlyBatchedGemv/cudaCoreGemm.cu
How to rewrite this kernel without referencing the implementation of cutlass