NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
4.87k stars 838 forks source link

[QST] Epilogue Reduction #1518

Open jeromeku opened 2 months ago

jeromeku commented 2 months ago

What is your question? I'm looking to define a GEMM that does the following (in pseudocode):

D = AB + C
F = norm(D, axis=1)
return F

That is, the epilogue should a) compute the column-wise 2-norm of D and b) store F to global, no need to store D. (2-norm being the sqrt of the sum of squares along axis=1).

What's the most appropriate epilogue type for this pattern specific for Ampere?

github-actions[bot] commented 1 month ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.