Question: partly GPU inference?

Maknee / minigpt4.cpp

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

MIT License

549 stars 26 forks source link

Closed mvsoom closed 11 months ago

mvsoom commented 11 months ago

Is it possible to offload some of the computation (on the LLM) side to the GPU? As with llama.cpp?

Maknee commented 11 months ago

Yes. It should be possible. For the language model (vicuna), you can edit these flags in cmake to enable GPU offloading (like cublas)

mvsoom commented 11 months ago

Great! Thank you!