SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.75k stars 908 forks source link

[Feature suggestion] Auto-translate window: single file support with Google translate #6033

Closed cyzs233 closed 2 years ago

cyzs233 commented 2 years ago

Currently google translate api is limited for free users (~1000 lines per day?). It would be nicer if subtitleedit adds some support around Google Translate's documents translation option to bypass its api limitations. Untitled

It's doable manually but takes a lot of steps. Hopefully, we could support it directly in this translate window.🀞 Untitled

Documentation : Translate documents

niksedk commented 2 years ago

OK, tried to add some support for this - via Auto-translate via copy-paste - https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.6/SubtitleEditBeta.zip

How does this work?

darnn commented 2 years ago

For the record, at least when I tried this a few months ago, the translation of the uploaded document was different (and worse) than the one you would get just pasting the text into the website version a few lines at a time. Some lines were left in English altogether, for instance.

cyzs233 commented 2 years ago

OK, tried to add some support for this - via Auto-translate via copy-paste - https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.6/SubtitleEditBeta.zip

How does this work?

Yeah, it works. But to be honest, it's kinda cumbersome to use this copy-paste translation window, it's generally not recommended to copy-paste translation back & forth between browsers. Thus we need to export the original subtitle to a single file. First, you have to max the "block size" to the maximum to reduce the process into a single copy-paste. Second, you have to remove any "line separator" since it will mess up the formatting of the translation result returned by google. Untitled (Period get translated into other punctuation) Untitled

Maybe we could add pdf support directly in the translate window I have mentioned? Cause that's all we wanted in the end and has been supported by all translation services. You will get instant translation results on google, which is significantly faster than API requests. IMO, the currently workflow (edit parameter ---> export to .txt ---> convert to .pdf/.docx ---> copy translation & paste into subtitleedit) needs a refactor. I don't think most users will get the feel of how to use this copy-paste feature.

@darnn Yes indeed, but it seems more like an "unknown word" (or names) related problem. Untitled

niksedk commented 2 years ago

OK, how should lines be separated?

Is ".rtf" supported? C# cannot write pdf/docx unless using some third party libraries... I just tested a pdf lib which added about 150 MB to SE... I found a docx library that was smaller.

It's harder to maintain code that uses external libs.

cyzs233 commented 2 years ago

OK, how should lines be separated?

A simple new line will do in this case.

Is ".rtf" supported? C# cannot write pdf/docx unless using some third party libraries... I just tested a pdf lib which added about 150 MB to SE... I found a docx library that was smaller.

It's harder to maintain code that uses external libs.

.rtf is not supported by Google translate as far as I can tell.

Adding .docx support to SubtitleEdit would be great. Maybe half of the current text editing issues can be closed if you offer them an option to edit in Word instead. Some other cool things you can do with .docx are :

Of course, developing difficulty is the first concern. Someone has to actually implement itπŸ˜‰

niksedk commented 2 years ago

It would only be .docx file reading/writing.

cyzs233 commented 2 years ago

It would only be .docx file reading/writing.

Hmm, that doesn't sound motivating to implement .docx support at all. Guess we'll stick with plain text then.

Now that we have discussed the restriction of this copy-paste feature, maybe we could deprecate this feature and move its save&load functionality to the Auto translate window (if there is no other use than simply copy-paste)? Web browsers tend to add formating to the translation result in their web UI, in such cases, users shouldn't just assume the copy-pasted translation just magically works in Subtitleedit and have to paste & review the result locally first. (Line separator get translated into other punctuation for example)

niksedk commented 2 years ago

The "Translate via Copy paste" is extremely useful... and the line separator is customizable.

I've tried to add docx import/export in the "Translate menu"... let me know if it's useful or not: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.6/SubtitleEditBeta.zip

niksedk commented 2 years ago

@cyzs233 : did you have a change to try the above beta?

cyzs233 commented 2 years ago

@niksedk I'm having trouble upload this .docx on Google translate. it says "file corrupted". (maybe .docx version compatability problem?πŸ€” )

Demo

![fail to upload on google translate](https://user-images.githubusercontent.com/25927091/175134804-5aa17bb5-1458-4df3-9dba-273e1c3d6b74.png)

It works on Baidu online translate(upload), but the table structure got reformated. Currently, You can use Word's built-in translator instead, it will maintain the original table structure.

Reformated result

![table structure lost in translation process](https://user-images.githubusercontent.com/25927091/175134873-a2eed4e4-d533-49e2-8b28-798112f739c4.png)

And then when I tried to open it using "Step 2 - import", an error was prompted.

import error

![error when importing](https://user-images.githubusercontent.com/25927091/175150779-8fb8915f-4349-44a3-964f-b79ee91572ab.png)

translation (returned by baidu).docx Actually, this import error will also occur even if the .docx is the file previously produced by "step 1 - Export" exported docx.docx

source .srt attached below: It Is Good to Live.1956.srt.zip

Update: why is this "Max block size feature" useful and some other experiments. #4910