abhirooptalasila / AutoSub

A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
MIT License
586 stars 102 forks source link

Include OpenAI Whisper model #70

Open xBurnsed opened 2 years ago

xBurnsed commented 2 years ago

OpenAI just released probably the best model that there is for speech recognition right now.

It would be great to incorprate this into this project!

More info: https://openai.com/blog/whisper/

qgustavor commented 1 year ago

I've been using Whisper to subtitle and translate to English videos which I could not find any subtitles. The only issue I saw with it is that, while OpenAI's implementation generates subtitles, sometimes their timestamps are not great. But there is WhisperX and whisper-timestamped that improve that.

To be fair, I got here because it showed up in GitHub's "Explore repositories" and I thought "What it does better than Whisper?" then I saw that's just an older project. At this point I don't see that's no longer a case of incorporating it in AutoSub: those projects (including OpenAI's implementation) already generate subtitles and seem to use a lot of tricks to improve performance. A lot faster in fact: faster-whisper speed is 54s/13min, 4 seconds per minute of audio, against AutoSub's 34 second per minute (40 minutes/70 minutes). Maybe AutoSub can be faster if the same hardware was used, but the readme makes it seem a quite slower.