Closed Manamama closed 1 year ago
Another suggestion: See e.g. this this sample: https://www.youtube.com/watch?v=JEHyXDZTDK4 It starts with Polish and then switches to the original Russian, with the manually curated and hard-coded ready Polish subtitles. At the default Whisper engine language detection option (Polish), the poor Whisper is trying to valiantly transcribe the subsequent Russian as Polish, and is actually making a couple of correct guesses. The best results are of course with the "translate" option, meant for such a case, see a sample below, but I wonder how to make it switch between the languages mid-stream. My initial (and naive) proposal:
[05:16.000 --> 05:27.000] And it sounds terrible to us, that if you know yourself and you know someone else's army, then you will meet a hundred times and you will win a hundred times. [05:27.000 --> 05:35.000] If you know yourself and you do not know him, then once you win, you will win them. [05:35.000 --> 05:40.000] If you do not know yourself and you do not know them, then you will meet a hundred times and you will lose a hundred times. [05:40.000 --> 05:45.000] So that the third scenario does not happen, I have a question, do we know them? [05:45.000 --> 05:48.000] Do we study them?
Hey, thanks for your input! Creating a proper Web Application would require way more work than I'm willing to put into this project. But the idea is very interesting. Also, I agree that optionally enforcing a more refined language detection would be very interesting, but this particular repository only focuses on the high-level implementation of the Colab interface. It would be best if you opened a discussion on the actual repository of the Whisper project about your suggestion.
Thank you for creating this and especially the extensive documentation. (I had started to cobble something similar, which works, but then gave up on it given a couple of more elegant competing options, including yours.)
Just a tip: maybe create it as a Google Drive app and post it on their Marketplace for even easier integration for end-users? (I wonder if they accept it as it is quite "Collab intense"...)