Closed jacobtang closed 3 months ago
No, that ist due to the way Whisper works and how it therefore is implemented. You'd need to program this on top of that.
Ok,thanks a lot! I use the process_text callback data, it costs about 3s to get the text data, is it normal? how should I reduce the time cost? use the recorder_config,and recorder.feed_audio(audio_chunk) method in GPU env.
Absolutely not normal. Even with largest model or on CPU transcription time should be way below 1s. I have no real idea, why this happens. Can you please try another model, like small.en or medium? Also a separated test with only faster_whisper library could tell us if it's the transcription or maybe something related to the VAD models.
Hi,@KoljaB I use the on_realtime_transcription_update(also try the on_realtime_transcription_stabilized) text,send to the web client,but it always show the pre text word. Is there other method that can avoid show the repeated pre words?Thanks! I try to implement live transcription function similar to Zoom Meeting in the product.