rocBLAS 5.0.0 introduced a new rocblas_Xtrmm_outofplace function to replace the two-matrix in-place version that exists. We should replace our use of the in-place one with the three-matrix out-of-place version.
However, rocblas_Xtrmm_outofplace also seems to be deprecated and will be replaced by rocblas_Xtrmm becoming the three-matrix out-of-place version. rocblas_Xtrmm_outofplace has been removed on develop at the time of writing (after 5.5.1). I tested the rocblas_Xtrmm_outofplace function and it works, but given that it's also going to be removed we can either:
Conditionally use rocblas_Xtrmm_outofplace between 5.0.0 and 5.5.X (with the assumption that 5.6.X will actually remove it).
Wait for rocblas_Xtrmm to become the out-of-place version and bump the version requirement to 5.6.X or newer.
The latter requires us to wait quite a while. E.g. LUMI still uses 5.2.3 by default so I think it's too soon to unconditionally require 5.6.X.
rocblas_Xtrmm_outofplace does not seem to solve the apparent performance drop that happened after 5.2.X which makes me lean towards: wait until we can require 5.6.X. But I'm happy to conditionally use rocblas_Xtrmm_outofplace as well if you prefer that. What do you think?
rocBLAS 5.0.0 introduced a new
rocblas_Xtrmm_outofplace
function to replace the two-matrix in-place version that exists. We should replace our use of the in-place one with the three-matrix out-of-place version.However,
rocblas_Xtrmm_outofplace
also seems to be deprecated and will be replaced byrocblas_Xtrmm
becoming the three-matrix out-of-place version.rocblas_Xtrmm_outofplace
has been removed ondevelop
at the time of writing (after 5.5.1). I tested therocblas_Xtrmm_outofplace
function and it works, but given that it's also going to be removed we can either:rocblas_Xtrmm_outofplace
between 5.0.0 and 5.5.X (with the assumption that 5.6.X will actually remove it).rocblas_Xtrmm
to become the out-of-place version and bump the version requirement to 5.6.X or newer.The latter requires us to wait quite a while. E.g. LUMI still uses 5.2.3 by default so I think it's too soon to unconditionally require 5.6.X.
rocblas_Xtrmm_outofplace
does not seem to solve the apparent performance drop that happened after 5.2.X which makes me lean towards: wait until we can require 5.6.X. But I'm happy to conditionally userocblas_Xtrmm_outofplace
as well if you prefer that. What do you think?