KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.1k stars 191 forks source link

Record blocked while transcribing (no real async possible) #46

Open Stefan-Kosker opened 7 months ago

Stefan-Kosker commented 7 months ago

when .text(function) is called, microphone is blocked and it is not listening. Speech happening at the same time is not captured and is also not listed in the recorder.audio_queue.qsize()

So a basic customer journey example.

I tried enable_realtime_transcription': True and False. Both has the problem. I am using recorder.text(process_text) which according to docs async as soon as I provide a function in .text(). But it appears to be not that async.

Can you please solve this? The queue appears to be buggy and with slow gpu / cpu, there is guaranteed data loss due to race condition.

Thank you

KoljaB commented 7 months ago

Since the whole transcription process is already encaptured within another process I have currently no idea what I can do here. There is a tiny chance that if it still blocks the main process it might just be a general performance issue on the system. If it's a real bug on CPU systems I'm unsure what to do here and open to suggestions.

Stefan-Kosker commented 7 months ago

At the lines beginning from 1025, we are blocking the main thread and wait until results are there. I am guessing, this problem is also there on GPUs, but because of speed + maybe usage of smaller models.

In C#, I would say, create a "Task" and you should be good to go in async being not blocking. In python, no idea not gonna lie

KoljaB commented 7 months ago

Main thread is blocked within the real-time transcription, also for wake word detection and VAD. In theory this should not affect recording, since this is done in another process. Recording since it runs entirely encapsulated should in theory guarantee that every chunk gets recorded and end up in the processing queue. I can't think of a way to encapsulate this even more. Honestly it should not even happen, maybe on a single core idk