eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
64 stars 14 forks source link

Replace rocblas trmm3 in-place call with out-of-place version #966

Closed msimberg closed 10 months ago

msimberg commented 1 year ago

rocBLAS 5.0.0 introduced a new rocblas_Xtrmm_outofplace function to replace the two-matrix in-place version that exists. We should replace our use of the in-place one with the three-matrix out-of-place version.

However, rocblas_Xtrmm_outofplace also seems to be deprecated and will be replaced by rocblas_Xtrmm becoming the three-matrix out-of-place version. rocblas_Xtrmm_outofplace has been removed on develop at the time of writing (after 5.5.1). I tested the rocblas_Xtrmm_outofplace function and it works, but given that it's also going to be removed we can either:

  1. Conditionally use rocblas_Xtrmm_outofplace between 5.0.0 and 5.5.X (with the assumption that 5.6.X will actually remove it).
  2. Wait for rocblas_Xtrmm to become the out-of-place version and bump the version requirement to 5.6.X or newer.

The latter requires us to wait quite a while. E.g. LUMI still uses 5.2.3 by default so I think it's too soon to unconditionally require 5.6.X.

rocblas_Xtrmm_outofplace does not seem to solve the apparent performance drop that happened after 5.2.X which makes me lean towards: wait until we can require 5.6.X. But I'm happy to conditionally use rocblas_Xtrmm_outofplace as well if you prefer that. What do you think?

msimberg commented 10 months ago

Fixed by #978.