LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

New requirements: 1. I hope to be able to load the webpage first like the open webui, and the model can be freely selected and switched in the webui. 2. When not used after 5 minutes or any set time, all GPU memory should be automatically released and loaded the next time someone requests an inference task on the web. #907

Closed windkwbs closed 3 weeks ago

windkwbs commented 3 weeks ago

New requirements:

  1. I hope to be able to load the webpage first like the open webui, and the model can be freely selected and switched in the webui.
  2. When not used after 5 minutes or any set time, all GPU memory should be automatically released and loaded the next time someone requests an inference task on the web.
LostRuins commented 3 weeks ago

Unfortunately, Koboldcpp does not currently and has no immediate plans to support hotswapping models or loading models from the web ui.