Prioritize GPU Over CPU for Initial Loading of OpenAI Whisper Model

alesaccoia / VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS

MIT License

736 stars 106 forks source link

Open jinmiaoluo opened 5 months ago

jinmiaoluo commented 5 months ago

Loading the OpenAI Whisper model into the GPU firstly instead of RAM using CPU.

Currently, our app only loads the OpenAI Whisper model into RAM using the CPU, instead of primarily using the GPU.