A SYRK example in cublasLt would be really useful. i.e. matmul(A, A transpose)
One of the cublasLtMatmulAlgoCapAttributes_t is for uplo support and mentions SYRK. However I don't know how I could guarantee that an algo takes advantage of A and B both being from the same memory space.
Do we pass NULL for B? Do we pass A for B and algo recognizes the pointers are the same? Maybe the optimization for shared memory space isn't much, but the algo is faster because of upper/lower fill only?
This is very clear in the cublasSsyrk call, I would like to know what to do for cublasLt.
A SYRK example in cublasLt would be really useful. i.e. matmul(A, A transpose)
One of the cublasLtMatmulAlgoCapAttributes_t is for uplo support and mentions SYRK. However I don't know how I could guarantee that an algo takes advantage of A and B both being from the same memory space.
Do we pass NULL for B? Do we pass A for B and algo recognizes the pointers are the same? Maybe the optimization for shared memory space isn't much, but the algo is faster because of upper/lower fill only?
This is very clear in the cublasSsyrk call, I would like to know what to do for cublasLt.
Thanks!