DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Description
We support a limited subset of kernels with HIP devices compared to CUDA.
Describe the solution you'd like
Every algorithm that is CUDA accelerated should also be HIP accelerated.
Additional context