NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.7k stars 978 forks source link

[QST] How to config the globalStrides in interleaved mode #1839

Open silenceluo opened 1 month ago

silenceluo commented 1 month ago

In 16B and 32B interleaved mode, how to calculate the globalStrides value? In 32B interleave for FP16 as an example, the layout would be N (C/64) DHW C64, how to set the globalStrides in this case?

silenceluo commented 1 month ago

In Linear mode, the dimension size is {N, D, H, W, C}, thus the globalStrides is {DHWC, HWC, WC, C}.

But in interleaved mode like 32Byte interleave mode for FP16, the layout is N(C/16) DHW(C16). We have tried to config globalStrides as {DHWC, HWC, WC, C}, {NDHW, DHW32, HW32, W32} and other similar settings, and they did not work.

Any suggestions?

github-actions[bot] commented 3 weeks ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.