cublasLt SYRK example - Githubissues

A SYRK example in cublasLt would be really useful. i.e. matmul(A, A transpose)

One of the cublasLtMatmulAlgoCapAttributes_t is for uplo support and mentions SYRK. However I don't know how I could guarantee that an algo takes advantage of A and B both being from the same memory space.

Do we pass NULL for B? Do we pass A for B and algo recognizes the pointers are the same? Maybe the optimization for shared memory space isn't much, but the algo is faster because of upper/lower fill only?

This is very clear in the cublasSsyrk call, I would like to know what to do for cublasLt.

Thanks!

NVIDIA / CUDALibrarySamples

cublasLt SYRK example #166