Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
8.2k stars 702 forks source link

The translation repeats words from the same line multiple times. #146

Open rheydheck opened 1 year ago

rheydheck commented 1 year ago

annoyingshit

emcodem commented 1 year ago

@rheydheck there are many issues about this open, please search before you post a new issue.

besides -mc 0 you can try this for example: https://github.com/Const-me/Whisper/issues/26#issuecomment-1524961608

It could be good if you close your issue and continue discussion in one of the other open threads about it. The reason is that we don't want too spread important information over multiple places.

rheydheck commented 1 year ago

@rheydheck there are many issues about this open, please search before you post a new issue.

besides -mc 0 you can try this for example: #26 (comment)

It could be good if you close your issue and continue discussion in one of the other open threads about it. The reason is that we don't want too spread important information over multiple places.

I tried the Windows command method, but it was too much work, and there's no support for translating files to "SRT" and "WebVTT". Do you have a guide on how to force the "-mc 0" Windows command line to Desktop Version?

emcodem commented 1 year ago

You cannot set this in Desktop currently and Const-Me is for sure very aware about that. However, the ability to run a commandline allows also automation which a desktop app does not, so after investing some initial work, your life becomes much easier. I added some words about how to use with a batch file here: https://github.com/Const-me/Whisper/issues/91#issuecomment-1508728708

Last but not least, main.exe can output vtt and srt, just provide -otxt or -ovtt in the command (you cannot specify a path where the file goes tough, its written in the same place where the audio file is)

astandarduser commented 1 year ago

-mc 0 works perfect, would be amazing to have this option in GUI. I am curious does '-mc 0' change the speed it is processed or quality of translation?

emcodem commented 1 year ago

mc means "max context", context is also called prompt. You can read more about possible effects of prompting for example here: https://platform.openai.com/docs/guides/speech-to-text/prompting

One point where prompts are helpful with the current Const-Me implementation, is to continue a sentence that has been startet at the end of a 30 second segment and is continued in the next segment.

Regarding speed, i did not encounter any noticable difference but also i did not specifically benchmark it because it is not an option for me to go without prompt (i need highest possible quality).

The screenshot above is a good example about how the wrong prompting can not only help but when used wrong, it can also be harmful: We see the first 2 "The" as red (low probability), then 2 "The" as yellow. Each time "The" is ouput, it is also being pushed into the context. the 5th time, we see it as green already because we have it 4 times in the context and the AI thinks "oh wow, this guy really loves this word 'The', i am 100% confident this is what he wants to hear from me". As a consequence, from this moment on, "The“ turns into the safest bet for the AI...

Highlander1536 commented 1 year ago

@rheydheck there are many issues about this open, please search before you post a new issue.

Maybe then it should get fixed 🙃

usermyname12 commented 1 year ago

This is be the biggest issue by far for me. Everything else seems to work fine. Come on, there are so many people having this issue. Should be number 1 priority. For the ui. People are far more likely to not use the program all together than use a cmd version with additional parameters as well. If i wanted a cmd version i would have not searched for alternative (this) in the first place.

emcodem commented 1 year ago

Not sure whats needed to motivate @Const-me to alter the existing desktop example program ;-) providing a checkbox for enabling mc 0 is a trivial task but i don't use desktop programs at all (only automated processing), so no reason for me to mess with the Desktop example program.

Highlander1536 commented 1 year ago

Not sure whats needed to motivate @Const-me to alter the existing desktop example program ;-) providing a checkbox for enabling mc 0 is a trivial task but i don't use desktop programs at all (only automated processing), so no reason for me to mess with the Desktop example program.

Someone just needs to fork it & do it themselves at this point

emcodem commented 1 year ago

Forking is not needed at all, the correct way is to just use the API or script around the main commandline example. I guess just as the guys from SubtitlEdit implemented a small GUI that can use all the main inference projects (whisper original, faster-whisper/Ctranslate2, whisper-cpp, whisper-const-me), there should be a lot of other projects and commercial products doing that already. But as i do not need GUI programs at all´so i have no overview about corresponding projects/products. Are you sure there are no projects that take the GUI part serious? (like subtitleedit partly does)

Let's ask the subtitleedit guys for their opinion: https://github.com/SubtitleEdit/subtitleedit/issues/7190