David-Kunz / gen.nvim

Neovim plugin to generate text using LLMs with customizable prompts
The Unlicense
1.1k stars 78 forks source link

llama cpp integration? #1

Closed JoseConseco closed 11 months ago

JoseConseco commented 11 months ago

Are there any changes for llama.cpp integration? It is using quantized models - allowing running bigger models with less VRAM required ( https://github.com/ggerganov/llama.cpp#quantization) and you can put part of computation of CPU, part on GPU, if whole model cannot be loaded into GPU. It supports server mode too- https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#testing-with-curl

JoseConseco commented 11 months ago

Ok, forget about it. From ollma docs I can see it is using llamacpp in background. And it supports manual addition of custom gguf models, with various params and etc.