FilipePS / Traduzir-paginas-web

Translate your page in real time using Google or Yandex
https://addons.mozilla.org/pt-BR/firefox/addon/traduzir-paginas-web/
Mozilla Public License 2.0
4.54k stars 550 forks source link

Problems with html tags in Japanese. #851

Open ye110wd opened 4 months ago

ye110wd commented 4 months ago

An example of problems with html tags

The last line of the main text on https://ncode.syosetu.com/n9806fw/99/ has translation problems.

「まあ、奴らを社会から退場させる方法は考えてある。<ruby>人<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>間<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>で<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>は<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>な<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>い<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>連<rp>(</rp><rt>・</rt><rp>)</rp></ruby><ruby>中<rp>(</rp><rt>・</rt><rp>)</rp></ruby>に名誉など必要ないからな」

it translates to "Well, I've thought of a way to get them out of society. people between in teeth Na stomach even middle I don't need honor."

but it should be "Well, I've thought of a way to get them out of society. People who aren't human don't need honor."

It seems to have something to do with. <ruby></ruby> or what is between. I've read "teeth Na stomach" so many times my brain is broken. It also happens with other words too.

Using Translate Selected Text from context menu translates correctly.

nanderer commented 3 months ago

hi,

i cant reproduce your issue, i can neither find in the original or in the translated version. See attached html files.

Or am i looking somehow wrong? Could you maybe add some screenshots to display the issue if it still exists?

Regards,

nanderer

Attachments: TWP issue #851.zip

lumynou5 commented 1 week ago

Reproduced in Firefox 131.0.3 with TWP 10.0.1.1: screenshot

If you select the text and translate, it works well, but not in case you translate the whole page. I think the reason is that the text gets split due to the <ruby>. For example, "人間" (people; human) is incorrectly interpreted as "人" (person) and "間" (between; gap).

In fact, other tags are treated in the same way. For instance, A of <em>B</em> should be <em>B</em>のA, but the <em> makes it split into "A of" and "B" and the final presentation is のA<em>B</em> (combined in the original order). I was gonna file a issue about this and found a possible solution actually, but <ruby> is quite different from this kind of tags, so I'm not sure whether these two related issues are similar/duplicated and whether they can be solved in a same way.