MicrosoftTranslator / DocumentTranslator-Legacy

Microsoft Document Translator (Archive) - Replaced by the MicrosoftTranslator/DocumentTranslation project in this repository.
Other
409 stars 152 forks source link

Word document translation issues #91

Open micche78 opened 5 years ago

micche78 commented 5 years ago

myname.docx myname.it.docx

Translating the partial formatted phrase "My name is Michael" result in two issues:

  1. the phrase is splitted in three parts because "is" is in bold, so the meaning is lost
  2. "is" is translated to the verb definition instead of literal translation
chriswendt1 commented 5 years ago

Indeed the system does not deal well with in-sentence markup. This is an artefact of OpenXML which does not keep sentences contiguous if there is inline markup. Options are:

1) Replace OpenXML logic with HTML. That is ugly, because it would require a good OpenXML<>HTML converter. Which Office is, but then this would require Office for all Office documents. OR

2) Go a bit deeper in simplifying OpenXML. That would come at the expense of losing the inline markup.

In short: Not a golden solution in sight.

micche78 commented 5 years ago

Thanks Chris, losing the inline markup is acceptable, is there any example to achieve that?

georgkirchner commented 4 years ago

Just sent Chris PPTX files with in-line formatting such as italics and bold face...

When parsing files with the OKAPI framework, inline tags no longer impact translations; i.e., the application sends the complete sentence - not fragments - to MSFT Translator.

That said, parsing in Document Translator v2.6 is much better than in v2.1.1. Here is an exmample translating text with inline tags from German to English, using Document Translator v2.6:

No inline tags in source: Source: Geben Sie Ihren Text in das Quelltextfeld ein und klicken Sie auf die Schaltfläche Übersetzen. Target: Enter your text in the source box and click the Translate button.

Inline tags in source: Source: Geben Sie Ihren Text in das Quelltextfeld ein und klicken Sie auf die Schaltfläche Übersetzen. Target: Give your Text into the source code field and click button on the Translate button.

Parsed with OKAPI the Target is as follows:

Enter your text in the source box and click the Translate button.