Montvydas / translatesubs

It's a tool to translate subtitles into any language, that is supported by google translator :)
Apache License 2.0
73 stars 13 forks source link

Cannot translate subtitles that contain romaji words #12

Open Kartatz opened 6 months ago

Kartatz commented 6 months ago

I have been using this tool for a long time, and it works pretty well most of the time. However, I have been struggling with an issue where it fails to translate subtitles containing romaji words:

$ translatesubs './0.vtt' --to_lang 'portuguese' '1.vtt'
Translating to "portuguese".
Trying separator " $$$ "...
original length=406, translated length=369
Trying separator " ### "...
original length=406, translated length=354
Trying separator " ∞ "...
original length=406, translated length=358
Trying separator "@@@"...
original length=406, translated length=363
Trying separator " ™ "...
original length=406, translated length=352
Trying separator " @@@ "...
original length=406, translated length=359
Trying separator "$$$"...
original length=406, translated length=383
Trying separator "€€€"...
original length=406, translated length=257
Trying separator "££"...
original length=406, translated length=323
Trying separator " ## "...
original length=406, translated length=353
Trying separator "@@"...
original length=406, translated length=373
Trying separator "$$"...
original length=406, translated length=374
It seems like all tries to translate got corrupted. Try to manually set the separator using --separator argument to be DIFFERENT from: " $$$ ", " ### ", " ∞ ", "@@@", " ™ ", " @@@ ", "$$$", "€€€", "££", " ## ", "@@", "$$". Check --help menu for more information.

I tried setting --separator to some random symbols, but no one seem to work.

The subtitles contain a lot of lines like this:

00:00:00.550 --> 00:00:04.240
Tokihanatsu yo

00:00:07.170 --> 00:00:12.480
Unmei no <i>rhapsody</i>

00:00:12.480 --> 00:00:18.130
<i>You are my destiny</i>

00:00:19.440 --> 00:00:23.060
Yosou mo dekinai

00:00:23.060 --> 00:00:28.020
Kurai <i>cry</i> yami no naka de samayotteru

Editting the subtitles file and removing all the romaji words fixes the issue:

$ translatesubs './0.vtt' --to_lang 'portuguese' '1.vtt'
Translating to "portuguese".
Trying separator " $$$ "...
original length=386, translated length=385
Trying separator " ### "...
original length=386, translated length=385
Trying separator " ∞ "...
Finished!

0.vtt

nacho00112 commented 6 months ago

same problem

nacho00112 commented 6 months ago

Fixed it, you need to try your luck with the separators, this one worked for me: ' ²²²²²² ' I was translating from english to spanish.