I see ram usage rises after the model loaded to GPU, how to release it?

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

MIT License

12.51k stars 1.05k forks source link

I see ram usage rises after the model loaded to GPU, how to release it? #912

Open terryops opened 4 months ago

terryops commented 4 months ago

I think the program load the model to ram, and then move it to GPU memory, and after that, the model can run on its own, with a few ram supporting the program. But now I'm facing the fact that the ram usage is about the same as GPU memory usage, any ideas on how to solve my problem? Thanks in advance.