I managed to reuse the code from the stream example and integrate it into a React application using Vite.js.
Keeping the basic implementation, adapted in TypeScript, I have a latency of about 1.5, 2 seconds on average.
But it looks like the implementation given in the example presents a fairly basic audio chunking strategy that could be improved.
Has any work already been done on this?
Could the CPP code that is then compiled with Emscripten be improved?
Additional context:
At the moment, my application uses vosk-browser, which plugs into an Audio streamer. I would like to turn to Whisper for its superior transcription quality and would like to optimize my implementation as much as possible to get closer to realtime with whisper.cpp.
Hello @ggerganov
I managed to reuse the code from the stream example and integrate it into a React application using Vite.js.
Keeping the basic implementation, adapted in TypeScript, I have a latency of about 1.5, 2 seconds on average.
But it looks like the implementation given in the example presents a fairly basic audio chunking strategy that could be improved.
Additional context:
At the moment, my application uses vosk-browser, which plugs into an Audio streamer. I would like to turn to Whisper for its superior transcription quality and would like to optimize my implementation as much as possible to get closer to realtime with whisper.cpp.