baxtree / subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
https://hub.docker.com/r/baxtree/subaligner
MIT License
442 stars 18 forks source link

[Question] Synchronize Translation #75

Closed ubanning closed 1 year ago

ubanning commented 1 year ago

Hello, I've been trying for days to find a way to automatically synchronize the translation of a subtitle that its original language is English and I want to translate it into Portuguese. Executing the command subaligner -m script -v teste.mp4 -s joao.txt -o subtitle_aligned.srt -t eng,por I got the following error: ERROR: Cannot find the MT model for source language "eng" and destination language "por"

So I believe that unfortunately the Portuguese language is not available, is there a possibility to add it? Or currently there is no model for the Portuguese language.

If you have any extra strategies for synchronizing a translated subtitle, I would be extremely grateful if you could share them with me. The big problem is that for the translation to be done correctly it needs context, so I need the text to be placed in the translator in a clean way, just the text (no line breaks and stuff) and somehow need some reference to the subtitle number or time for it to be synchronized, but unfortunately ALL the strategies I used, the translator at some point removes my reference. Some strategies I used:

1. Hello, 2. how are you?
[1] Hello, [2] how are you?
Hello, :: how are you?
Hello, // how are you?

Thanks.

baxtree commented 1 year ago

Hi @ubanning, your belief was spot on. The underlying OPUS models don’t have a base release for opus-mt-en-pt as they do for other language pairs. After browsing their HF repo, I discovered they released a big model called opus-mt-tc-big-en-pt last year. I have made all big models supported by subaligner and please checkout the latest change from the master branch before attempting your command again.

ubanning commented 1 year ago

Hi @baxtree, Thanks for the answer. Now it worked, but I have another question: I would like to use a translation that I have that is in plain text format and I would like to sync it. How could I do this? I wouldn't want to use the service to translate, but just to sync the translation I already have. Can you help me? Thanks :)

baxtree commented 1 year ago

Hi @ubanning, currently it doesn't know how to split the text into subtitle cues so users need to perform pre-sync segmentation by separating the cue texts with newlines like this:

# joao.txt
cue_1_text

cue_2_text
...

cue_n_text

Then you can run the syncing without translation: $ subaligner -m script -v teste.mp4 -s joao.txt -o subtitle_aligned.srt

PS: If you are interested in implementing auto-splitting, PRs are always welcome :)

baxtree commented 1 year ago

Big models are now available in release 0.3.0. Closing this now and thanks for opening the issue.