Using rocblas_initialize has the minor downside that rocblas will load all tensile code objects instead of lazy loading them, causing to to load tensile code objects it will never need at all, which has a small extra runtime cost at startup.
Ideally if there is a compile time macro that we can use to check the version of rocBLAS, we could use that to call rocblas_initialize only in the versions that need it.
Background Description
https://github.com/ggerganov/llama.cpp/blob/3ad5451f3b75809e3033e4e577b9f60bcaf6676a/ggml/src/ggml-cuda/ggml-cuda.cu#L125 is no longer nesscary since rocblas commit https://github.com/ROCm/rocBLAS/commit/bc4d8f57ec6b3b2c91c4eaa5351bcc35ced66d52 which landed in rocm 6.0. Im not sure how long we want to keep supporting old versions of rocm.
Using rocblas_initialize has the minor downside that rocblas will load all tensile code objects instead of lazy loading them, causing to to load tensile code objects it will never need at all, which has a small extra runtime cost at startup.
Possible Refactor Approaches
remove this workaround