Closed sreeroopnaidu closed 3 years ago
Docx2Python v2 now recognizes <m:t>
elements, so will capture some information from equations. For equations in Linear format, Docx2Python v2 will export valid Latex.
Equations in "Professional" format will not return anything useful. A simple integral from 0 to 1 would return "01x"
. It might be straightforward to write a parser to replace
_
^
Someone might come behind me and do it, but there's little return in going down that (pretty much impossible to comprehensively test) road, as Word will easily convert all equations in a document to "Linear" format. These now export nicely from Docx2Python v2. That same integral in Inline format will export as:
'\\int_{0}^{1}x'
Here's a peek at the xml for a summation in Professional format. The information is there if anyone wants to extend this module with a parser. I suggest not for the previously mentioned testing issues.
<m:nary>
<m:naryPr>
<m:chr m:val="∑"/>
<m:limLoc m:val="subSup"/>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
<w:i/>
</w:rPr>
</m:ctrlPr>
</m:naryPr>
<m:sub>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>
0
</m:t>
</m:r>
</m:sub>
<m:sup>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>
1
</m:t>
</m:r>
</m:sup>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>
x
</m:t>
</m:r>
</m:e>
</m:nary>
Thank you, sreeroopnaidu.
@ShayHill Is it possible to add delimiters between the exported Latex so as to identify those as equations? Something as done in this library: https://github.com/hrushikeshrv/docxlatex#usage
What delimiter do you suggest?
Sent from my iPhone
On Nov 2, 2021, at 11:55 PM, usr3 @.***> wrote:
@ShayHillhttps://github.com/ShayHill Is it possible to add delimiters between the exported Latex so as to identify those as equations? Something such as done in this library: https://github.com/hrushikeshrv/docxlatex#usage
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/15#issuecomment-958655309, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIE7CZUYZGF2W6J3GI6TUKC6FDANCNFSM43SEGSEQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
What delimiter do you suggest?
We can use a similar delimiter as used for images, footnote etc. It works really well with regex.
----latex e = mc^2----
or
----equation x = {-b \pm \sqrt{b^2-4ac} \over 2a}----
I am going to upload v2.0 to PyPi by end of November. Will include a delimiter for equations.
Still deciding between what you suggest and .
Sent from my iPhone
On Nov 4, 2021, at 7:14 AM, usr3 @.***> wrote:
What delimiter do you suggest?
We can use a similar delimiter as used for images, footnote etc. It works really well when working with regex.
----latex e = mc^2---- or
----equation x = {-b \pm \sqrt{b^2-4ac} \over 2a}----
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/15#issuecomment-960774944, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIE7I26GRV7LFKSNQU5DUKJ2KFANCNFSM43SEGSEQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thank you very much. I will look into this.
From: usr3 @.> Sent: Saturday, November 6, 2021 3:47 AM To: ShayHill/docx2python @.> Cc: Shay Hill @.>; Mention @.> Subject: Re: [ShayHill/docx2python] docx2python cann't read the mathematical equation (#15)
Just to report, the latex being returned contains an extra \ for every backslash which breaks the equation. For instance, B=\left[\begin{matrix}-1&0\0&-1\\end{matrix}\right] becomes B=\left[\begin{matrix}-1&0\\0&-1\\\end{matrix}\right]
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/15#issuecomment-962419968, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIE7LAIHWLMIJ63U47CTUKTTPFANCNFSM43SEGSEQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@ShayHill Have you decided on a delimiter for equations? I can send a PR with the delimiter you suggest.
I like
A pr would be great.
On Dec 23, 2021, at 3:33 AM, usr3 @.***> wrote:
@ShayHillhttps://github.com/ShayHill Have you decided on a delimiter for equations? I can send a PR with the delimiter you suggest.
— Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/15#issuecomment-1000167203, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIE6QY37PNNRR43LG7HLUSLUGHANCNFSM43SEGSEQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>
@ShayHill Not sure if it's the right way, but sent PR #28 which uses the parent insert_text_as_new_run
should also work.
it will show empty array []