NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.46k stars 924 forks source link

[QST] The best way to do D = func(A x B) x C. #1551

Open amazingyyc opened 4 months ago

amazingyyc commented 4 months ago

I want todo a function like D = func(A x B) x C,

T = A x B // Matrix multiply 
T = func(T) // Some other operator, like mask add bias ...
D = T x C // Matrix multiply 

Want do 3 operator: a matrix multiply follow a function than do another matrix multiply in one kernel. I have 2 idea.

  1. calculate T = A x B get result in register and do func(T) than write T into shared mem. Last do T x C write result into D.
  2. calculate T = A x B get result in register and do func(T), do't write func(T) into shared mem, just calculate T x C in register.

For 1 it's easy to understand. Does it cost because write func(T) into shared mem and read again when T x C? For 2 how can I make sure func(T) in register is need by T x C?

github-actions[bot] commented 3 months ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 1 week ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.