KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.09k stars 190 forks source link

The last audio segment is always lost during real-time transcription #113

Open MRuAyAN opened 2 months ago

MRuAyAN commented 2 months ago

I guess it may be because my processing speed is slow and the transcription takes a long time.

After vad detects silence, it calls stop() and then does not send the frame's cached audio segment to child__transcription_pipe.

How should this situation be modified?

KoljaB commented 2 months ago

You are right, this is not how it should be.

For a quick fix please add

self.frames.append(data)

right before the

self.stop()

in Line 1404 in audio_recorder.py.

I will fix this in the next release.

MRuAyAN commented 2 months ago

Thanks for your quick reply.

But I think this will not work, because once stop() is called, the state of is_recording becomes False, and audio_array is not added to the pipe for processing.

in Line 1459 in audio_recorder.py.