Open leizhao1234 opened 2 weeks ago
When performing vectorized access to global or shared memory, should I use AlignedArray<T, 8> or Array<T, 8>?
If you are out of cutlass, I think you shoul use int2 , float2 ... builtin types that cuda provided. If you are using cutlass, I think it will do that for you to use vectorized instructions.
int2
float2
When performing vectorized access to global or shared memory, should I use AlignedArray<T, 8> or Array<T, 8>?