SethRobinson / UGT

Universal Game Translator - Uses Google's Cloud Vision to read and speak dialog from any image/game in any language
https://www.codedojo.com/?p=2426
Other
120 stars 24 forks source link

Vertical japanese doesn't translate well #20

Open Meerkov opened 3 years ago

Meerkov commented 3 years ago

Though uncommon, some games use top-to-bottom (and right-to-left) written Japanese. I found that the system struggled to identify a block of text that was only 1 character wide.

And let's say that there was a block of text that was 3 characters wide by 10 characters tall. Then it would translate it as if it was 10 lines of text that were 3 characters each, resulting in a garbled mess.

Furthermore, the translation would then try to fit this garbled mess of english into a vertical format that it doesn't really fit in...

I'm not sure how this should be fixed, as it's likely a failure on the Cloud side... but maybe there is a way to throw in a hack whenever a block of japanese text that is much taller than wide? Probably between the OCR step and the translate step?

SethRobinson commented 3 years ago

Yeah, Google's stuff can't handle this yet.

It should be possible to manually piece characters together and send that for translation (we do have the position of each character on the screen, not just each "word" or whatever) but I currently don't have plans to add this.

SethRobinson commented 3 years ago

Note, this has changed! Google does properly handle vertical text now, when testing with examples on https://w3c.github.io/i18n-drafts/articles/vertical-text/index.en the OCR does fine.

And while it's possible to click and hear the correct Japanese (or English translation) being spoken, it's formatted horizontally (by UGT) so it's difficult to read.

Will have to give that some thought on formatting, but good to know the Google side can do it now.

Meerkov commented 3 years ago

I found out apparently you can also try to set the model to "builtin/latest", which gives the newest features. Apparently vertical text detection was available in that model for 2 years, according to a blogpost I saw. It might be worth trying that setting to see if it makes a difference in the quality of the detection