Closed andriken closed 1 week ago
I don't see the problem exactly in what you are showing, how can you tell that it's not working?
maybe I'm using wrong parameter for my purpose, I'm sorry but there is 1 second of silence during the duration of this segment then how come the segment still covers the silence part too? I tried large-V2 as well same thing.
Silence is removed and the speech segments are concatenated together, the timestamps are restored to the original before silence removal, you should not notice anything except better transcription quality, but the segments are not split at silence
but what if I don't want it to concatenate by using the silence timing! I just want it to split at silence is there a way? I want the transcription accurate close to dubbing.
the easiest solution is to use word_timestamps=True
and then align the words as you like, other than that you'll have to customize the code to behave the way you want it
see my code below simple, but then why the segment has more than around 1 second of silence in between it, even if I set the "min_silence_duration_ms" to 400 or less it's still same not affect.
[0.72s -> 6.28s] After that, I talked a lot with my mother about the past three years.