NVIDIA / CUDALibrarySamples

CUDA Library Samples
Other
1.5k stars 311 forks source link

cublasLt SYRK example #166

Open capybara-club opened 8 months ago

capybara-club commented 8 months ago

A SYRK example in cublasLt would be really useful. i.e. matmul(A, A transpose)

One of the cublasLtMatmulAlgoCapAttributes_t is for uplo support and mentions SYRK. However I don't know how I could guarantee that an algo takes advantage of A and B both being from the same memory space.

Do we pass NULL for B? Do we pass A for B and algo recognizes the pointers are the same? Maybe the optimization for shared memory space isn't much, but the algo is faster because of upper/lower fill only?

This is very clear in the cublasSsyrk call, I would like to know what to do for cublasLt.

Thanks!