NVIDIA / CUDALibrarySamples

CUDA Library Samples
Other
1.5k stars 311 forks source link

SplitK for multiblock_gemm in cuBLASdx #192

Open osayamenja opened 2 months ago

osayamenja commented 2 months ago

Hello!

I am currently learning CUTLASS and cuBLASdx and I have a question. multiblock_gemm.cu only allows K that fits in smem. I believe it can be extended to larger K following the splitK pattern here, but I am not quite sure how to implement this, I would appreciate suggestions!

osayamenja commented 2 months ago

Adding this from CUTLASS, where they mention using a semaphore across CTAs, I think this is the best approach. Let me know if you agree.