Open v3DJG6GL opened 4 months ago
I'd like to point out that it implies energy savings as well.
Wouldn't it be this feature? https://github.com/mudler/LocalAI/pull/1341
Wouldn't it be this feature? mudler/LocalAI#1341
Yes, that's the PR I also linked up there.
First of all thanks for this great project!
Description
I would like to have an option to set an idle time after which the model is unloaded from RAM/VRAM.
Background:
I have several applications that use the VRAM of my GPU, one of these is LocalAI. Since I don't have unlimited VRAM, these applications have to share the available memory among themselves. Luckily, since some time LocalAI has implemented a watchdog functionality that can be used to unload the model after a specified idle timeout. I'd love to have some similar functionality for whisper-asr-webservice For now, whisper-asr-webservice is occupying 1/3rd of my VRAM although it is used only from time to time.