ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
11 stars 9 forks source link

Update the dtd testers with GPU for the newer, simpler GPU DTD queuing #55

Closed abouteiller closed 1 year ago

abouteiller commented 2 years ago

Dplasma is using the old style DTD with GPU. This is doing it the new way (also it doesn't crash, so there's that).

abouteiller commented 2 years ago

@therault this computes the wrong answer, I suspect because using cublasZgemm (v1) on the cuda_stream->cuda_stream rather than the replacement system with the registered handles.

bosilca commented 1 year ago

As discussed on 03/31/23 this need to be updated and the result need to be validated.

therault commented 1 year ago

See previous comment

abouteiller commented 1 year ago

This now computes the correct answer for N=3NB (i.e., only one GEMM) but still computes the wrong answer for N=4NB (i.e., 4 gemms)

bosilca commented 1 year ago

Similar work has been completed in #94