Improve Speech Segmentation: Adjusting Parameters to Recognize Shorter Segments

bond005 / pisets

The python library and service for automatic speech recognition and transcribing in Russian and English

Apache License 2.0

42 stars 3 forks source link

Improve Speech Segmentation: Adjusting Parameters to Recognize Shorter Segments #7

Open Koldim2001 opened 1 week ago

Koldim2001 commented 1 week ago

Could you please advise on what can be done to improve the segmentation of speech regions? I have a lot of segments ranging from 15 to 20 seconds that are being skipped because the algorithm apparently considers them to be silence. Could you suggest if there are any parameters in the code that I can adjust in the right direction to increase the number of recognized segments? I would be very grateful for your advice.

Koldim2001 commented 1 week ago

Example:

3
00:00:25,300 --> 00:00:34,760
Text ....

4
00:00:35,480 --> 00:00:38,080
Text ....

#Here you can see a 20-second gap even though the speaker did not change the tone or volume of their voice.

5
00:00:57,800 --> 00:01:12,540
Text ....

Koldim2001 commented 1 week ago

I tried disabling the classification stage completely, and it helped. However, I couldn't find where the threshold for confidence could be set. Disabling it seems too drastic to me.