Closed tororon1231 closed 2 weeks ago
There is an easy solution to this: Simply run 2 instances of koboldcpp on two different ports, one with GPU selected and one without. Then, from the Web UI you can connect to both instances individually, e.g. select the img generation from instance 2 and txt gen from instance 1.
Okay, nice workaround. Thanks!
@LostRuins Hi. I just have one question. Can I use this method in the web UI for other model types like Whisper.
For whisper, it currently does not allow you to use a different URL - I might add that in future.
Add a way to selectively disable GPU acceleration for each model type (even from the GUI). For example, use the GPU only for text generation and not for image generation, and vice versa. This is needed for low-end GPUs with small amounts of VRAM, so that CUDA does not easily run out of memory and crash when running multiple models.