Closed aleksandr-smechov closed 1 year ago
https://github.com/guillaumekln/faster-whisper/pull/123/files This seems to work as well.
EDIT: Have noticed issues with this solution, in that timestamps sometimes "go backwards". The original example above works well across three tests.
End timestamps seem to be off by 100-300ms+ at times. This could possible be due to the current "hacky" segmentation algorithm here:
https://github.com/guillaumekln/faster-whisper/blob/5c17de17713f65929c7c33add3a9735ff75a945c/faster_whisper/transcribe.py#L734
One solution could be to monkey-patch faster-whisper to
A)
use VAD timestamps to get the median and maximum values, andB)
add punctuation "durations" to the previous word. Something along the lines of: