Open OkGoDoIt opened 1 year ago
agreed, whisper is OK at clear English spoken to it.. but it's pretty bad at video subtitling. Any background noise I guess is what it is, makes it's timings way off is the biggest issue but also words start to be wrong.
hoping MMS is truly better, would have to test though
This will allow for very interesting novel uses of sound based interfaces, e.g. for edge case languages.
Note that the license of the model is CC-BY-NC, so it does not allow commercial use.
Our lord and savior @ggerganov will come to the rescue soon! 🙏
I mean just look at the numbers whisper is trained at 680 k hrs labelled data ,and look at meta only 45k hours labelled data train ,whisper will work well cause its trained significantly large while meta only 45 k hrs trained therefore their error rate is very low
I mean just look at the numbers whisper is trained at 680 k hrs labelled data ,and look at meta only 45k hours labelled data train ,whisper will work well cause its trained significantly large while meta only 45 k hrs trained therefore their error rate is very low
I think it's only benchmarking better at languages with fewer data. So for larger languages, it's actually way worse:
See this table in the appendix of the paper:
I'm curious if anyone has had a chance to compare the model architecture of Meta's new Massively Multilingual Speech (MMS) models to see if this system could use those models? They are claiming a massive reduction in word error rate, as well as support for over 1000 languages. It's unclear how performant MMS is likely to be but I'm sure someone here will look into it. I'd love to hear any thoughts or notes from anyone who is better equipped to understand how these two projects might align.