Closed skittlesvampir closed 11 months ago
@skittlesvampir Bug fixed: when the accurate text is also in SRT format, both timestamps were in output 🙃
Oh my god, now it works!! Thank you so much.
Just two small details:
Screenshot 1
Screenshot 2
I think it would be very hard to do a good job when guessing timestamps interpolations. In-between texts could be partially fast or slow and may include some sub-parts without spoken text.
For point 2: the real solution is to improve the Whisper recognition. This can be obtained with WhisperHallu. https://github.com/EtienneAb3d/WhisperHallu
For both points 1 and 2: I'm currently working on a solution using word-level timestamps and some complementary pre-/post-processing around WhisperHallu. I don't plan to release it fully open-source. We can discuss about it if you have a budget.
I will check WhisperHallu out, it seems cool.
Unfortunately, I don't have a budget, I'm just synchronizing my own shows so I can understand them better.
Anyways, I think the errors are acceptable, so thank you for your work! I wish your business much success in the future!
Problem description: https://github.com/openai/whisper/discussions/1770#discussioncomment-7526482
I've uploaded the data at: https://ben.ist-toll.xyz/k/whisper-test-files/