ggerganov / llama.cpp

LLM inference in C/C++
MIT License
68.75k stars 9.87k forks source link

Refactor: remove rocblas workaround for old versions of rocblas #10549

Open IMbackK opened 1 week ago

IMbackK commented 1 week ago

Background Description

https://github.com/ggerganov/llama.cpp/blob/3ad5451f3b75809e3033e4e577b9f60bcaf6676a/ggml/src/ggml-cuda/ggml-cuda.cu#L125 is no longer nesscary since rocblas commit https://github.com/ROCm/rocBLAS/commit/bc4d8f57ec6b3b2c91c4eaa5351bcc35ced66d52 which landed in rocm 6.0. Im not sure how long we want to keep supporting old versions of rocm.

Using rocblas_initialize has the minor downside that rocblas will load all tensile code objects instead of lazy loading them, causing to to load tensile code objects it will never need at all, which has a small extra runtime cost at startup.

Possible Refactor Approaches

remove this workaround

slaren commented 1 week ago

Ideally if there is a compile time macro that we can use to check the version of rocBLAS, we could use that to call rocblas_initialize only in the versions that need it.