Open Koldim2001 opened 1 week ago
Example:
3
00:00:25,300 --> 00:00:34,760
Text ....
4
00:00:35,480 --> 00:00:38,080
Text ....
#Here you can see a 20-second gap even though the speaker did not change the tone or volume of their voice.
5
00:00:57,800 --> 00:01:12,540
Text ....
I tried disabling the classification stage completely, and it helped. However, I couldn't find where the threshold for confidence could be set. Disabling it seems too drastic to me.
Could you please advise on what can be done to improve the segmentation of speech regions? I have a lot of segments ranging from 15 to 20 seconds that are being skipped because the algorithm apparently considers them to be silence. Could you suggest if there are any parameters in the code that I can adjust in the right direction to increase the number of recognized segments? I would be very grateful for your advice.