New requirements: 1. I hope to be able to load the webpage first like the open webui, and the model can be freely selected and switched in the webui. 2. When not used after 5 minutes or any set time, all GPU memory should be automatically released and loaded the next time someone requests an inference task on the web.

LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.34k stars 310 forks source link

Closed windkwbs closed 3 weeks ago

windkwbs commented 3 weeks ago

New requirements:

I hope to be able to load the webpage first like the open webui, and the model can be freely selected and switched in the webui.
When not used after 5 minutes or any set time, all GPU memory should be automatically released and loaded the next time someone requests an inference task on the web.

LostRuins commented 3 weeks ago

Unfortunately, Koboldcpp does not currently and has no immediate plans to support hotswapping models or loading models from the web ui.