New requirements: 1. I hope to be able to load the webpage first like the open webui, and the model can be freely selected and switched in the webui. 2. When not used after 5 minutes or any set time, all GPU memory should be automatically released and loaded the next time someone requests an inference task on the web. #907
I hope to be able to load the webpage first like the open webui, and the model can be freely selected and switched in the webui.
When not used after 5 minutes or any set time, all GPU memory should be automatically released and loaded the next time someone requests an inference task on the web.
New requirements: