At present, the model remains loaded in the VRAM continuously. It would be beneficial if, after a specified period of idleness, the VRAM could be automatically cleared. Ollama exhibits this functionality, automatically offloading content from the VRAM after five minutes of inactivity.
At present, the model remains loaded in the VRAM continuously. It would be beneficial if, after a specified period of idleness, the VRAM could be automatically cleared. Ollama exhibits this functionality, automatically offloading content from the VRAM after five minutes of inactivity.