MicrosoftTranslator / DocumentTranslator-Legacy

Microsoft Document Translator (Archive) - Replaced by the MicrosoftTranslator/DocumentTranslation project in this repository.
Other
409 stars 152 forks source link

Errors While Translating Large PDFs #84

Closed eirmon closed 5 years ago

eirmon commented 5 years ago

Hi

I have been having issues when running large pdf documents through the Document Translator. I am using the Document Translator MSI which leverages the Translator Hub. The language package being used is a default English-German language package (not a custom package). The documents in question are very large pdf documents of between 50 and 100 pages. Other pdfs of a smaller size do not seem to encounter many issues but most of these large docs encounter multiple issues. Examples of items being missed would be section headers, footnotes, glossaries, table entries as well as sporadic paragraphs. It is important to say that 95 percent of the documents do translate and it not every table entry or footnote etc. that are missed, it is just a few of these with the vast majority translating without an issue. However, this is confusing as you would expect it to either for example, totally miss a group of 5 footnotes or else translate all of them, but here it can be 4 out of 5 translating with only 1 being missed. So, any advice or direction would be greatly appreciated as this is proving to be a major obstacle at the moment.

Examples of Missed Content • Section Headers • Table Entries • Footnotes • Glossaries • Paragraphs

Links to Example Problem Documents:

https://prospectus-express.broadridge.com/Get_Template.asp?clientid=pipod&fundid=74437E750&level=4&doctype=pros

https://prospectus-express.broadridge.com/Get_Template.asp?clientid=pipod&fundid=74441T108&level=4&doctype=pros

https://prospectus-express.broadridge.com/Get_Template.asp?clientid=pipod&fundid=74440V104&level=4&doctype=pros

Shane

chriswendt1 commented 5 years ago

Hi Shane, thanks a lot for the report, and for providing the sample document.

eirmon commented 5 years ago

Hi Chris

From some further testing we noticed the following which might be of some use to you in determining why some sentences etc. are being missed:

eirmon commented 5 years ago

Hi Chris. I appreciate you are probably very busy but can you offer any update on this issue or maybe a timeline of when we can expect some progress? Also, could please let me know whenever you release a new version of the Document Translator MSI if it is addressing this bug.

chriswendt1 commented 5 years ago

I think almost all is fixed in version 2.3.0. Please give it a try. A few remaining problems with ALL CAPS titles. This is a se5rvice issue and should go away in August without needing a new Document Translator.

chriswendt1 commented 5 years ago

Please reopen if you still see larger portions untranslated.