HandleRichText=true and removed spaces in translation, + strange conditions

TokcDK commented 3 years ago

Just uploaded example picture to understand. When HandleRichText is false, no issue. 2021-05-14_185416

Here also examples how it was translated with true and false (little offtopic and about quality of translation in different situations): true(deepl): 神は申された。この世界は<color%3D#BB0000><color%3D#0066AA>『災厄』</color></color>による滅亡の危機に瀕している、と。="And God said, 'This world...<color%3D#BB0000><color%3D#0066AA>"is on the brink of annihilation</color></color>The world is on the brink of annihilation. false(deepl, looks like deepl not breaking rich text tags): 神は申された。この世界は<color%3D#BB0000><color%3D#0066AA>『災厄』</color></color>による滅亡の危機に瀕している、と。=God has said that this world is on the verge of destruction by <color%3D#BB0000><color%3D#0066AA>"disaster"</color>. false(google, it breaking tags by paste spaces but they can be restored): 神は申された。この世界は<color%3D#BB0000><color%3D#0066AA>『災厄』</color></color>による滅亡の危機に瀕している、と。=God was applying. This world is in danger of destruction by <color %3D # bb0000> <color %3D # 0066aa> "Disasters" </ Color> </ Color>. Tested on Reapers Order(RJ309886) because there many of rich text from start. As result I see that for the game better to use HandleRichText=false and it maybe would be good to add tags restore when HandleRichText=false and using endpoint like google.

Also not exactly the option. From other issue #99 and translation results I understood that HandleRichText not doing hard things like remove all tags before translation and paste them back in required places and did not found any more info about option but it looks like it split rich text by tags, translate all parts separately and then paste tags back and merge translated parts in result text.

While tests of mentioned game quite easy found place where added tag removing and tag restore functions in created fork but not found place of merging rich text back to fix spaces. When tried to find in sources place where translated text merging back with rich tags found couple strange conditions where comparing simular values in core/texttranslationcache.cs: untranslatedResult.Arguments.Count == untranslatedResult.Arguments.Count

gravydevsupreme commented 3 years ago

So, the way HandleRichText works is by translating all parts individually and stringing them back together as you say yourself. The reason spaces are not handled correctly is because the source text, being japanese, doesn't have any spaces in it either. If there were spaces, those would be preserved as well.

It's certainly interesting that DeepL is capable of handling rich text (at least this example). The reason the entire HandleRichText feature was made was because Google translate would always mess the tags up. The reason there seems to be sort of a "double" translation when HandleRichText is true for DeepL is because it tries to use contextual information about the text that came before/after each token when translating. This sometimes causes the artifact you see there. If you ever used the Windows DeepL translator Application, you may have noticed it suffers from the exact same problem because it adds this same contextual information for each line.

The if-statement you mention is surely a bug. Not a super important one because I in general would expect the if-statement to evaluate to true either way. I will fix this statement, though.

About removing all tags and restoring them: I am not quite sure how you want to achieve this because you have to guess where the tags must be placed which may be in entirely different places than the original text due to language structure. It would however, not be difficult to modify the RichTextParser to simply get rid of all rich text all together and send a single translation out instead. This would give a better translation, but all markup would obviously all disappear with it.

TokcDK commented 3 years ago

About removing all tags and restoring them: I am not quite sure how you want to achieve this because you have to guess where the tags must be placed which may be in entirely different places than the original text due to language structure. It would however, not be difficult to modify the RichTextParser to simply get rid of all rich text all together and send a single translation out instead. This would give a better translation, but all markup would obviously all disappear with it.

I even not trying to make so. I understood from 99 issue that it is very hard to make remove and paste back rich text tags in proper places and the best way is to make it on translation service's side while translation process and before noticed that deepl is like understand rich text tags I just added removing of all tags before translation and after also for tests added tags restoring in forked copy(link in 1st message).

gravydevsupreme commented 3 years ago

Oh, now I see what you are trying to do. You are trying to "unmangle" translations with markup after they're returned.

Let me ask you: How well does that work?

TokcDK commented 3 years ago

Let me ask you: How well does that work?

Here is all what was added. I just inserted to FixTranslatedText additional line to method where it find all matches of tags and then just remove tag and paste it with removed spaces. And also added check to not run restore when there is no tags in translated text. It was made quickly and I even not tested it enough because noticed that deepl can work with rich text tags and not need to restore them for it but for google restoring is in generally working.

bbepis / XUnity.AutoTranslator

HandleRichText=true and removed spaces in translation, + strange conditions #188