NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.53k stars 943 forks source link

[QST] How to config the globalStrides in interleaved mode #1839

Open silenceluo opened 3 weeks ago

silenceluo commented 3 weeks ago

In 16B and 32B interleaved mode, how to calculate the globalStrides value? In 32B interleave for FP16 as an example, the layout would be N (C/64) DHW C64, how to set the globalStrides in this case?

silenceluo commented 3 weeks ago

In Linear mode, the dimension size is {N, D, H, W, C}, thus the globalStrides is {DHWC, HWC, WC, C}.

But in interleaved mode like 32Byte interleave mode for FP16, the layout is N(C/16) DHW(C16). We have tried to config globalStrides as {DHWC, HWC, WC, C}, {NDHW, DHW32, HW32, W32} and other similar settings, and they did not work.

Any suggestions?