ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
11 stars 9 forks source link

dtd wrappers: cublas fill mode was not set correctly #133

Closed abouteiller closed 5 days ago

abouteiller commented 3 weeks ago

This causes failures in the DTD GPU testers.

abouteiller commented 2 weeks ago

src/dtd_wrappers/ztrsm.c:155 (and a couple other places) these used to be macros with silent side effects, they are now proper functions (#121) so that usage pattern is now incorrect and would result in passing the PLASMA_UPLO to CUBLAS, and it doesn't like that because unlike plasma/cblas/hipblas that all use {'L', 'U', 'F'}. CUBLAS_FILL have enum values {1, 2, 3} instead.

The rest is just using the functions we have to simplify repetition of code, I went over all usages were we did convert from PLASMA_OP/FILL/SIDE, I left alone when we were directly passing the cublas constants because it is not dependent on task pool globals.