ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
10 stars 8 forks source link

Consistent ordering of RECURSIVE/CUDA bodies: CUDA first, recursive #103

Closed abouteiller closed 8 months ago

abouteiller commented 8 months ago

Consistent ordering of RECURSIVE/CUDA bodies: CUDA first, recursive second.

This ordering is consistent with existing ordering in zgeqrf, and most kernels in zpotrf (except for po_po); this will prefer using GPU kernels if available, and fallback to recursive if load balancer lets a kernel fallback to CPU.

bosilca commented 8 months ago

This is not how we decided to fix this issue.