Closed hksfang closed 1 year ago
Thanks for the PR. @hksfang with and w/o the change, qq, how much latency have it reduced?
Thanks for the PR. @hksfang with and w/o the change, qq, how much latency have it reduced?
The result varies on the speaker's speaking pattern, for longer speech with more 'gaps' in between, this PR reduces considerable latency, but for shorter speech or speech with little to no gaps, this PR doesn't do much.
I found it difficult to benchmark this change, but I can provide an example. Using this audio file as example, before this PR, transcription after the end of speech took 5.5s, after this PR, only 2s, that's around 54% reduction in transcription latency.
Thanks for the PR. @hksfang with and w/o the change, qq, how much latency have it reduced?
The result varies on the speaker's speaking pattern, for longer speech with more 'gaps' in between, this PR reduces considerable latency, but for shorter speech or speech with little to no gaps, this PR doesn't do much.
I found it difficult to benchmark this change, but I can provide an example. Using this audio file as example, before this PR, transcription after the end of speech took 5.5s, after this PR, only 2s, that's around 54% reduction in transcription latency.
Wow. this is super awesome. Thanks for making the change. LGTMed
Utilize whisper transcription for speech interim chunks to achieve transcribing while speaking. Test video: https://drive.google.com/file/d/19I73GiIcz3Rkj6zb2KuTD93GLXFFd3y8/view?usp=sharing