DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
We cannot control uplo in dpotrf (it is hardcoded), and friends. This should be a command line parameter
In addition the performance of PO Upper is really bad on some hardware (e.g., rocm) due to poor kernel optimization in rocblas/cublas itself, so being able to investigate both LO/UP performance is important
Description
We cannot control uplo in dpotrf (it is hardcoded), and friends. This should be a command line parameter
In addition the performance of PO Upper is really bad on some hardware (e.g., rocm) due to poor kernel optimization in rocblas/cublas itself, so being able to investigate both LO/UP performance is important
Describe the solution you'd like
--uplo upper
--uplo lower