Can the accuracy of the timestamp be improved?

ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++

MIT License

35.95k stars 3.67k forks source link

Can the accuracy of the timestamp be improved? #255

Open czkoko opened 1 year ago

czkoko commented 1 year ago

The timestamp of whisper is not very accurate. The following is the comparison between Microsoft Cognitive Services Speech and whisper.

1                                    
00:00:00,120 --> 00:00:01,379 (Microsoft)    
[00:00:00.000 --> 00:00:02.000] (whisper)
2
00:00:02,120 --> 00:00:06,320 (Microsoft)  
[00:00:02.000 --> 00:00:07.500] (whisper)

misutoneko commented 1 year ago

Yes, this would be much appreciated, I'm not sure how much can be done without retraining the model(s) though. I suppose you are using the large model? I've found the smaller models to be less accurate.

Btw for the original whisper there's the stable-ts fork, maybe that can provide some inspiration. See here: https://github.com/openai/whisper/discussions/435

ggerganov commented 1 year ago

The timestamp precision is a limitation of the model. You would need some sort of pre/post-processing to improve the timestamps. But at the moment it is not clear what is the best approach.

pneyrinck commented 1 year ago

Apparently, this work has been done to improve time stamps. https://github.com/jianfch/stable-ts