ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
11 stars 9 forks source link

HIP: support all the same kernels as CUDA #98

Open abouteiller opened 1 year ago

abouteiller commented 1 year ago

Description

We support a limited subset of kernels with HIP devices compared to CUDA.

Describe the solution you'd like

Every algorithm that is CUDA accelerated should also be HIP accelerated.

Additional context

❯ ag type=CUDA src -l                                                                                                                                                     ─╯
src/ztrsm_LUT.jdf
src/ztrsm_LLT.jdf
src/zgemm_NN.jdf
src/ztrsm_LUN.jdf
src/zpotrf_U.jdf
src/ztrsm_LLN.jdf
src/zpoinv_L.jdf
src/zgemm_TN_summa.jdf
src/zgemm_NT.jdf
src/zgemm_NN_summa.jdf
src/zgemm_TN.jdf
src/ztrsm_RLN.jdf
src/ztrsm_RUN.jdf
src/zgemm_NN_gpu.jdf
src/zgemm_TT_summa.jdf
src/zgemm_NT_summa.jdf
src/zgemm_TT.jdf
src/zpoinv_U.jdf
src/ztrsm_RLT.jdf
src/ztrsm_RUT.jdf
src/zgetrf_nopiv.jdf
src/zpotrf_L.jdf
src/zgeqrf.jdf

     ~/parsec/dplasma     feature/hip *1 ?6 ······················································································ bouteill@methane  14:48:41   ─╮
❯ ag type=HIP src -l                                                                                                                                                      ─╯
src/zpotrf_U.jdf
src/zgemm_TN_summa.jdf
src/zgemm_NN_summa.jdf
src/zgemm_NN_gpu.jdf
src/zgemm_TT_summa.jdf
src/zgemm_NT_summa.jdf
src/zpotrf_L.jdf
abouteiller commented 1 year ago

In particular TRSM is used in check_solution etc, and that can be pretty slow on large matrices without it.

abouteiller commented 2 months ago

TRSM: #122