NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.77k stars 989 forks source link

[QST]Question about vectorized memory accesses. #1946

Open leizhao1234 opened 2 weeks ago

leizhao1234 commented 2 weeks ago

When performing vectorized access to global or shared memory, should I use AlignedArray<T, 8> or Array<T, 8>?

foreverlms commented 6 days ago

If you are out of cutlass, I think you shoul use int2 , float2 ... builtin types that cuda provided. If you are using cutlass, I think it will do that for you to use vectorized instructions.