Open czkoko opened 1 year ago
Yes, this would be much appreciated, I'm not sure how much can be done without retraining the model(s) though. I suppose you are using the large model? I've found the smaller models to be less accurate.
Btw for the original whisper there's the stable-ts fork, maybe that can provide some inspiration. See here: https://github.com/openai/whisper/discussions/435
The timestamp precision is a limitation of the model. You would need some sort of pre/post-processing to improve the timestamps. But at the moment it is not clear what is the best approach.
Apparently, this work has been done to improve time stamps. https://github.com/jianfch/stable-ts
The timestamp of whisper is not very accurate. The following is the comparison between Microsoft Cognitive Services Speech and whisper.