McCloudS / subgen

Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, Emby, Tautulli, or Bazarr
MIT License
532 stars 48 forks source link

Feature to just translate #22

Closed Dima-Kal closed 10 months ago

Dima-Kal commented 10 months ago

Hi, is it possible to add an option to just translate? a lot of media comes with proper English subtitles / easily find good subtitles, so there is no need for transcribing but only to use AI to translate to another language

McCloudS commented 10 months ago

See docker environment variable: TRANSCRIBE_OR_TRANSLATE. Takes either 'transcribe' or 'translate' (See Docker Variables). Transcribe will transcribe the audio in the same language as the input. Translate will transcribe and translate into English.

So you can go Japanese > English or Japanese > Japanese, but not Japanese > German.

The Whisper model isn't trained to translate from one foreign (non-english) to another foreign language.

Dima-Kal commented 10 months ago

Whisper model is AFAIK for transcribing, and as you've mentioned the "translate" value doesn't exactly do that: "Translate will transcribe and translate into English" - what i've suggested is:

  1. I already have good English subtitles
  2. Take the good English subtitles .srt file
  3. Translate it to whatever foreign language needed
McCloudS commented 10 months ago

What you’re asking isn’t supported by whisper, so this won’t happen unless whisper gets a large update supporting it.

On Sat, Oct 28, 2023 at 1:24 PM Dima-Kal @.***> wrote:

Whisper model is AFAIK for transcribing, and as you've mentioned the "translate" value doesn't exactly do that: "Translate will transcribe and translate into English" - what i've suggested is:

  1. I already have good English subtitles
  2. Take the good English subtitles .srt file
  3. Translate it to whatever foreign language needed

— Reply to this email directly, view it on GitHub https://github.com/McCloudS/subgen/issues/22#issuecomment-1783902960, or unsubscribe https://github.com/notifications/unsubscribe-auth/APJACQO3KGPMZHBDXNI73V3YBVLUXAVCNFSM6AAAAAA6UISVPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHEYDEOJWGA . You are receiving this because you commented.Message ID: @.***>

Dima-Kal commented 10 months ago

What you’re asking isn’t supported by whisper, so this won’t happen unless whisper gets a large update supporting it. On Sat, Oct 28, 2023 at 1:24 PM Dima-Kal @.> wrote: Whisper model is AFAIK for transcribing, and as you've mentioned the "translate" value doesn't exactly do that: "Translate will transcribe and translate into English" - what i've suggested is: 1. I already have good English subtitles 2. Take the good English subtitles .srt file 3. Translate it to whatever foreign language needed — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APJACQO3KGPMZHBDXNI73V3YBVLUXAVCNFSM6AAAAAA6UISVPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHEYDEOJWGA . You are receiving this because you commented.Message ID: @.>

I know this isn't supported by whisper, i think there is a different module for that but i personally haven't tried it: https://github.com/nidhaloff/deep-translator

McCloudS commented 10 months ago

Out of scope for this project. I also wouldn't be able to text and validate that it did any of the translations correctly. I would also imagine that translating an audio file from (for example) German to English, then transcribing it, then translating it Japanese would yield poor results.

Dima-Kal commented 10 months ago

Out of scope for this project. I also wouldn't be able to text and validate that it did any of the translations correctly. I would also imagine that translating an audio file from (for example) German to English, then transcribing it, then translating it Japanese would yield poor results.

unless im missing something in the subtitle world, why would you go for this path? English is the universal language and translating from English to any other language should be the only path

McCloudS commented 10 months ago

I am a native English speaker, I went for this path literally as described in the project documentation at https://github.com/McCloudS/subgen#why. This project utilizes existing libraries such as Whisper, faster-whisper, and stable-ts, none of which can do what you ask. I recommend you seek your solution elsewhere. Based off of the documentation at https://github.com/nidhaloff/deep-translator it wouldn't be terrible difficult for someone to fork my main branch and integrate it.

Dima-Kal commented 10 months ago

I am a native English speaker, I went for this path literally as described in the project documentation at https://github.com/McCloudS/subgen#why. This project utilizes existing libraries such as Whisper, faster-whisper, and stable-ts, none of which can do what you ask. I recommend you seek your solution elsewhere. Based off of the documentation at https://github.com/nidhaloff/deep-translator it wouldn't be terrible difficult for someone to fork my main branch and integrate it.

Oh gotcha now, but if you are using bazarr, why not use bazarr integration of whisper?

McCloudS commented 10 months ago

It literally didn't exist when I created this before the integration existed. When I asked if they had intended to integrate it in January of 2023, they said no, so I created this in February.

On Sat, Oct 28, 2023 at 2:16 PM Dima-Kal @.***> wrote:

I am a native English speaker, I went for this path literally as described in the project documentation at https://github.com/McCloudS/subgen#why. This project utilizes existing libraries such as Whisper, faster-whisper, and stable-ts, none of which can do what you ask. I recommend you seek your solution elsewhere. Based off of the documentation at https://github.com/nidhaloff/deep-translator it wouldn't be terrible difficult for someone to fork my main branch and integrate it.

Oh gotcha now, but if you are using bazarr, why not use bazarr integration of whisper?

— Reply to this email directly, view it on GitHub https://github.com/McCloudS/subgen/issues/22#issuecomment-1783913076, or unsubscribe https://github.com/notifications/unsubscribe-auth/APJACQP2J3KJ4KQNTK2PL4TYBVRZZAVCNFSM6AAAAAA6UISVPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHEYTGMBXGY . You are receiving this because you modified the open/close state.Message ID: @.***>