ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.63k stars 3.63k forks source link

Feature request - transcription + translation in another language #1219

Open swswsws583 opened 1 year ago

swswsws583 commented 1 year ago

I'm hoping to make some bilingual subtitles for my videos, it would be great if you can add this feature, or hopefully real-time transcription + translation in the future. Thanks for all the great work 😄

jcalderita commented 1 year ago

Hello.

I use for example from English to Spanish "-l es" and it works for me.

swswsws583 commented 1 year ago

Hello.

I use for example from English to Spanish "-l es" and it works for me.

Hi, Thanks for your reply. I was not trying to transcribe a non-English audio file and translating it into English, but transcribing and translating from any language to another language locally.

swswsws583 commented 1 year ago

I'm hoping to make some bilingual subtitles for my videos, it would be great if you can add this feature, or hopefully real-time transcription + translation in the future. Thanks for all the great work 😄

I'm pretty sure whispher.cpp other laguages.

Hi, I am not sure what you meant, but I guess you can see my response to guranu.

swswsws583 commented 1 year ago

I'm hoping to make some bilingual subtitles for my videos, it would be great if you can add this feature, or hopefully real-time transcription + translation in the future. Thanks for all the great work 😄

I'm pretty sure whispher.cpp other laguages.

Hi, I am not sure what you meant, but I guess you can see my response to guranu.

Oh i misread that, well it won't happen because the original whisper ai by openai can't do translations in diffrent in laguages but it can transcribe in diffrent languages.

Since the -tr argument provides translation into English, I wonder if whisper.cpp can offer other language outputs.

bobqianic commented 1 year ago

OpenAI's Whisper currently only handles Any-to-English translations. If you're interested in Any-to-Any translations, you might want to check out Meta's latest Seamless-M4T.

swswsws583 commented 1 year ago

Thanks for sharing this!

luquitared commented 8 months ago

I actually think this is possible with whisper but is unclear how it would impact performance.

This repo shows that you can force whisper to decode to a specific language: https://github.com/Vaibhavs10/translate-with-whisper

for lang in list_of_languages:
    whisper_asr.model.config.forced_decoder_ids = (
        whisper_asr.tokenizer.get_decoder_prompt_ids(
            language=lang,
            task="transcribe"
            )
        )
    print(whisper_asr(next(iter(common_voice_en))["audio"]["array"])["text"])

Would be fun to do a test on this with whisper.cpp