DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
10
stars
8
forks
source link
Consistent ordering of RECURSIVE/CUDA bodies: CUDA first, recursive #103
Consistent ordering of RECURSIVE/CUDA bodies: CUDA first, recursive second.
This ordering is consistent with existing ordering in zgeqrf, and most kernels in zpotrf (except for po_po); this will prefer using GPU kernels if available, and fallback to recursive if load balancer lets a kernel fallback to CPU.
Consistent ordering of RECURSIVE/CUDA bodies: CUDA first, recursive second.
This ordering is consistent with existing ordering in zgeqrf, and most kernels in zpotrf (except for po_po); this will prefer using GPU kernels if available, and fallback to recursive if load balancer lets a kernel fallback to CPU.