NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
4.85k stars 835 forks source link

[QST]How to implement different type between D0(D1) and D2 based on 45_dual_gemm example #1555

Open Sunny-bot1 opened 1 month ago

Sunny-bot1 commented 1 month ago

Hi, The 45_dual_gemm example implements that the intermediate output and the final output are of the same type (D0, D1 and D2 must be the same type). To prevent loss of precision, I want to keep high precision in intermediate results(D0, D1) and output low precision(D2).

Could you give me some advice? Thank you very much!!!

github-actions[bot] commented 1 week ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.