Closed Nexesenex closed 1 week ago
Useful for CPU based inference, but also for Cublas lowvram inference (TG)
See : https://github.com/ggerganov/llama.cpp/pull/7606
Useful for CPU based inference, but also for Cublas lowvram inference (TG)
See : https://github.com/ggerganov/llama.cpp/pull/7606