Maknee / minigpt4.cpp

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
MIT License
549 stars 26 forks source link

Question: partly GPU inference? #15

Closed mvsoom closed 11 months ago

mvsoom commented 11 months ago

Is it possible to offload some of the computation (on the LLM) side to the GPU? As with llama.cpp?

Maknee commented 11 months ago

Yes. It should be possible. For the language model (vicuna), you can edit these flags in cmake to enable GPU offloading (like cublas)

CMakeLists.txt

mvsoom commented 11 months ago

Great! Thank you!