computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

DOCX/OpenXML: equations(oMathPara, oMath) extracted as text #334

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

The equations are extracted as inline text from tikal. During docx translation 
using tikal the equations are real problem.

Scenario:
1)extract inline text from docx (tikal.sh -lm)
2)translate the inline text (using moses)
3)Create Translated document
As the tags are treated as inline tags and can be reordered, the best case 
scenario is that the equations is distorted but often a non valid document is 
created. There is little to no way of telling that this entire segment is 
not-translatable form inline, there is no valuable information to be extracted 
from it. 
the 'oMath' tag should be treated as a non-translatable object.

I am using 0.21 version on ubuntu.

Original issue reported on code.google.com by karlis.g...@gmail.com on 8 May 2013 at 7:27

Attachments:

GoogleCodeExporter commented 9 years ago
This probably needs an option.  I have encountered people who definitely do 
want to translate the textual parts of equation, however perilous it is.

Original comment by tingley on 8 May 2013 at 4:52

GoogleCodeExporter commented 9 years ago
To add to this: we have encountered some problems with some other tags with 
questionable necessity to translate them at all. For me the best would be if it 
I would be able to pass as parameters the tags to process as non translatable 
objects.

One other example that I can give right of the bat is that it extracts the 
author of the document. Which definitely might have some uses but in both 
translation and term extraction it is extremely unnecessary and a potential 
problem.

Original comment by karlis.g...@gmail.com on 9 May 2013 at 6:33

GoogleCodeExporter commented 9 years ago

Original comment by tingley on 7 Mar 2015 at 6:23