Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.46k stars 2.4k forks source link

Add flush_flag to listener to flush the recorded audio immediately without waiting for the phrase to complete. #761

Open sreekanthputta opened 4 months ago

sreekanthputta commented 4 months ago

Add flush_flag to listener to flush the recorded audio immediately without waiting for the phrase to complete.

I am working on a real time speech to text application where I am kinda facing an issue. When the user is done talking, the speech_recognizer waits until the pause_threshold is elapsed. This gets even worse in noisy environments with the dynamic_energy_threshold turned off.

My users don't want to wait as they know that they are done talking. They want to be able to hit enter and reduce the time taken to show them the transcription.

This is just one example of where this could be helpful. I'm sure this feature can be useful in many ways.

I have tried stopper but, it takes a maximum of a second to stop but wont flush the audio. Also, the stopper wont stop the recorder when the audio is being actively recorded at the times where energy > energy_threshold.

Hence this change.

How to use?

self.flush_flag = [False]
self.recorder.listen_in_background(self.source, self.record_callback, phrase_time_limit=self.record_timeout, flush_flag=self.flush_flag)

def onEnter():
    self.flush_flag = [True] # this flag will be reset to false once the audio is flushed.

Please feel free to modify the logic to make it more clean and robust. TIA. <|endoftext|>

ftnext commented 3 months ago

Thanks. Is this the same feature request with #757?

sreekanthputta commented 3 months ago

Not really.

757 is about streaming buffers as they are recorded.

My change is about the speaker being able to stop the recording immediately after he is done speaking either by clicking transcribe button on my UI or releasing the mic button which held since he started speaking.