Closed mvsoom closed 11 months ago
Is it possible to offload some of the computation (on the LLM) side to the GPU? As with llama.cpp?
Yes. It should be possible. For the language model (vicuna), you can edit these flags in cmake to enable GPU offloading (like cublas)
CMakeLists.txt
Great! Thank you!
Is it possible to offload some of the computation (on the LLM) side to the GPU? As with llama.cpp?