Closed abouteiller closed 1 year ago
@therault this computes the wrong answer, I suspect because using cublasZgemm (v1) on the cuda_stream->cuda_stream
rather than the replacement system with the registered handles.
As discussed on 03/31/23 this need to be updated and the result need to be validated.
See previous comment
This now computes the correct answer for N=3NB (i.e., only one GEMM) but still computes the wrong answer for N=4NB (i.e., 4 gemms)
Similar work has been completed in #94
Dplasma is using the old style DTD with GPU. This is doing it the new way (also it doesn't crash, so there's that).