Open nikans opened 6 months ago
Hello. I've just updated from 0.1.0 to 1.0.1 version of the library and noticed that timings are incorrect, like it's transcribing a longer audio.
For example, 23 seconds of subtitles:
{"segments": [{"id": 1, "end": 12.16, "start": 4.500000000000001, "words": [{"end": 5.38, "start": 4.500000000000001}, {"end": 5.74, "start": 5.38}, {"end": 6.18, "start": 5.74}, {"end": 6.5, "start": 6.18}, {"end": 6.78, "start": 6.5}, {"end": 7.2, "start": 6.78}, {"end": 8.18, "start": 7.2}, {"end": 9.12, "start": 8.94}, {"end": 9.52, "start": 9.12}, {"end": 9.94, "start": 9.52}, {"end": 10.42, "start": 9.94}, {"end": 10.72, "start": 10.42}, {"end": 11.14, "start": 10.72}, {"end": 11.48, "start": 11.14}, {"end": 12.16, "start": 11.48}]}, {"id": 2, "end": 16.44, "start": 12.9, "words": [{"end": 13.08, "start": 12.9}, {"end": 13.36, "start": 13.08}, {"end": 13.74, "start": 13.36}, {"end": 14.62, "start": 13.74}, {"end": 15.28, "start": 14.62}, {"end": 15.94, "start": 15.28}, {"end": 16.44, "start": 15.94}]}, {"id": 3, "end": 22.58, "start": 19.38, "words": [{"end": 19.68, "start": 19.38}, {"end": 19.94, "start": 19.68}, {"end": 20.46, "start": 19.94}, {"end": 21.06, "start": 20.46}, {"end": 21.58, "start": 21.06}, {"end": 22.14, "start": 21.58}, {"end": 22.58, "start": 22.14}]}]}
for an 18-seconds audio file:
20240412040528-962.wav.zip
I've tried with VAD filter off and on. Anyway, I don't understand how exactly should VAD affect this. I also tried with distil and a regular fw models (all medium). Same.
What could have gone wrong? Thanks.
The multiplier seems to be ~0.72. So when corrected by this value, the timings point to the right audio time.
Hello. I've just updated from 0.1.0 to 1.0.1 version of the library and noticed that timings are incorrect, like it's transcribing a longer audio.
For example, 23 seconds of subtitles:
{"segments": [{"id": 1, "end": 12.16, "start": 4.500000000000001, "words": [{"end": 5.38, "start": 4.500000000000001}, {"end": 5.74, "start": 5.38}, {"end": 6.18, "start": 5.74}, {"end": 6.5, "start": 6.18}, {"end": 6.78, "start": 6.5}, {"end": 7.2, "start": 6.78}, {"end": 8.18, "start": 7.2}, {"end": 9.12, "start": 8.94}, {"end": 9.52, "start": 9.12}, {"end": 9.94, "start": 9.52}, {"end": 10.42, "start": 9.94}, {"end": 10.72, "start": 10.42}, {"end": 11.14, "start": 10.72}, {"end": 11.48, "start": 11.14}, {"end": 12.16, "start": 11.48}]}, {"id": 2, "end": 16.44, "start": 12.9, "words": [{"end": 13.08, "start": 12.9}, {"end": 13.36, "start": 13.08}, {"end": 13.74, "start": 13.36}, {"end": 14.62, "start": 13.74}, {"end": 15.28, "start": 14.62}, {"end": 15.94, "start": 15.28}, {"end": 16.44, "start": 15.94}]}, {"id": 3, "end": 22.58, "start": 19.38, "words": [{"end": 19.68, "start": 19.38}, {"end": 19.94, "start": 19.68}, {"end": 20.46, "start": 19.94}, {"end": 21.06, "start": 20.46}, {"end": 21.58, "start": 21.06}, {"end": 22.14, "start": 21.58}, {"end": 22.58, "start": 22.14}]}]}
for an 18-seconds audio file:
20240412040528-962.wav.zip
I've tried with VAD filter off and on. Anyway, I don't understand how exactly should VAD affect this. I also tried with distil and a regular fw models (all medium). Same.
What could have gone wrong? Thanks.