ahmetoner / whisper-asr-webservice

OpenAI Whisper ASR Webservice API
https://ahmetoner.github.io/whisper-asr-webservice
MIT License
1.85k stars 331 forks source link

Possibility to unload/reload model from VRAM/RAM after IDLE timeout #196

Open v3DJG6GL opened 4 months ago

v3DJG6GL commented 4 months ago

First of all thanks for this great project!

Description

I would like to have an option to set an idle time after which the model is unloaded from RAM/VRAM.

Background:

I have several applications that use the VRAM of my GPU, one of these is LocalAI. Since I don't have unlimited VRAM, these applications have to share the available memory among themselves. Luckily, since some time LocalAI has implemented a watchdog functionality that can be used to unload the model after a specified idle timeout. I'd love to have some similar functionality for whisper-asr-webservice For now, whisper-asr-webservice is occupying 1/3rd of my VRAM although it is used only from time to time.

LuisMalhadas commented 3 months ago

I'd like to point out that it implies energy savings as well.

thfrei commented 2 months ago

Wouldn't it be this feature? https://github.com/mudler/LocalAI/pull/1341

v3DJG6GL commented 2 months ago

Wouldn't it be this feature? mudler/LocalAI#1341

Yes, that's the PR I also linked up there.