LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

Option to selectively disable GPU acceleration for each model type #916

Closed tororon1231 closed 2 weeks ago

tororon1231 commented 3 weeks ago

Add a way to selectively disable GPU acceleration for each model type (even from the GUI). For example, use the GPU only for text generation and not for image generation, and vice versa. This is needed for low-end GPUs with small amounts of VRAM, so that CUDA does not easily run out of memory and crash when running multiple models.

LostRuins commented 2 weeks ago

There is an easy solution to this: Simply run 2 instances of koboldcpp on two different ports, one with GPU selected and one without. Then, from the Web UI you can connect to both instances individually, e.g. select the img generation from instance 2 and txt gen from instance 1.

tororon1231 commented 2 weeks ago

Okay, nice workaround. Thanks!

tororon1231 commented 2 weeks ago

@LostRuins Hi. I just have one question. Can I use this method in the web UI for other model types like Whisper.

LostRuins commented 2 weeks ago

For whisper, it currently does not allow you to use a different URL - I might add that in future.