Open skalee opened 4 years ago
LaTeXML probably can be turned into a gem with native extensions, but this requires some work.
You can do this, and if it works, it will be useful in Metanorma as well.
Metanorma uses the LaTeXML installation separately via package managers. In the docker image it uses CPAN, in other situations the Snap package and the Chocolatey package.
@skalee for LaTeX math, ONLY LaTeXML is deterministically accurate and correct (i.e. it always arrives at the correct structure), even though it is slower than others. It is also necessary to use the same processor being used in Metanorma because the terminology site software is part of our standardization suite.
@skalee for LaTeX math, ONLY LaTeXML is deterministically accurate and correct (i.e. it always arrives at the correct structure), even though it is slower than others. It is also necessary to use the same processor being used in Metanorma because the terminology site software is part of our standardization suite.
Okay, these are strong arguments. I'll experiment with LaTeXML then.
Regarding bridging LaTeXML as native extension: Initially I thought that LaTeXML is written in C, but now I see it's in Perl. This makes everything difficult. Resources on the topic are scarce, if any. We're literally entering uncharted waters and I doubt we'll succeed, especially that I don't know Perl at all. Nevertheless, I'll be happy to try. (update: this is very old, but looks promising: ruby-perl)
However, we can still call LaTeXML from a subshell, and we can avoid repetitive calls by caching the results. This should improve performance greatly, especially if we use a disk case in order to persist it between builds. At the moment I'm pretty convinced we'll end up with subshell calls.
Having said that, I still don't know what to do with missing entities like \backepsilon
. Following formula is taken directly from concept 259 "isomorphism".
latexmlmath '[A,B \textit{ isomorphic}] \Leftrightarrow [\exists f : A \rightarrow B, g : B \rightarrow A \backepsilon f \circ g = Id_A, g \circ f = Id_B]'
On my computer, it ends up with one error (Error:undefined:\backepsilon The token T_CS[\backepsilon] is not defined) and one warning (Warning:not_parsed:UNKNOWN.ATOM.CLOSE>METARELOP MathParser failed to match rule 'Anything'). Produced MathML is as follows (note merror
element):
<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="[A,B\textit{ isomorphic}]\Leftrightarrow[\exists f:A\rightarrow B,g:B% \rightarrow A\backepsilon f\circ g=Id_{A},g\circ f=Id_{B}]" display="block">
<mrow>
<mrow>
<mo stretchy="false">[</mo>
<mi>A</mi>
<mo>,</mo>
<mi>B</mi>
<mtext mathvariant="italic"> isomorphic</mtext>
<mo stretchy="false">]</mo>
</mrow>
<mo>⇔</mo>
<mrow>
<mo stretchy="false">[</mo>
<mo>∃</mo>
<mi>f</mi>
<mo>:</mo>
<mi>A</mi>
<mo>→</mo>
<mi>B</mi>
<mo>,</mo>
<mi>g</mi>
<mo>:</mo>
<mi>B</mi>
<mo>→</mo>
<mi>A</mi>
<merror class="ltx_ERROR undefined undefined">
<mtext>\backepsilon</mtext>
</merror>
<mi>f</mi>
<mo>∘</mo>
<mi>g</mi>
<mo>=</mo>
<mi>I</mi>
<msub>
<mi>d</mi>
<mi>A</mi>
</msub>
<mo>,</mo>
<mi>g</mi>
<mo>∘</mo>
<mi>f</mi>
<mo>=</mo>
<mi>I</mi>
<msub>
<mi>d</mi>
<mi>B</mi>
</msub>
<mo stretchy="false">]</mo>
</mrow>
</mrow>
</math>
You can copy-paste it to MathJax demo.
@ronaldtse I still have troubles with LaTeXML. Does anyone know how to fix error produced by following command (Error:undefined:\backepsilon
)?
latexmlmath '[A,B \textit{ isomorphic}] \Leftrightarrow [\exists f : A \rightarrow B, g : B \rightarrow A \backepsilon f \circ g = Id_A, g \circ f = Id_B]'
@skalee Please check usage of latexmlmath in the metanorma gem. Backepsilon is recognized there.
Some concepts may contain mathematical symbols and formulas in their designations, descriptions, or notes. Formulas can be expressed either in LaTeX math, AsciiMath, or MathML. It is also preferred that concepts follow AsciiDoc stemming syntax with
stem
,asciimath
, andlatexmath
macros.Available converters
There are some programs which come handy:
AsciiMath gem
A handy gem which converts AsciiMath to MathML. AsciiDoctor relies on it when processing stem macros (optional dependency). Does job pretty well, however does not convert LaTeX math strings. There is no corresponding gem for LaTeX math.
LaTeXML
A toolset for processing LaTeX documents. Most importantly, it contains
latexmlmath
program, which converts LaTeX math formulas to MathML. Sadly, this program fails to recognize some symbols, e.g.\backepsilon
. Perhaps this can be fixed with proper configuration.Example:
latexmlmath '\sqrt{b^2-4ac}'
Pandoc
Pandoc is capable of converting LaTeX math to MathML, though it must be wrapped in a Markdown document. We can craft a minimalistic Markdown document and then extract MathML formula from generated HTML.
Example:
echo '$$\sqrt{b^2-4ac}$$' | pandoc --mathml -f markdown -t html
MathJax
MathJax converts both AsciiMath and LaTeX math to MathML. It is designed to be run in browser primarily, but works in NodeJS too. The problem is that it is poorly documented, and API docs are non-existent. There are some usage examples in https://github.com/mathjax/MathJax-demos-node which present working solutions. Following two snippets use programs from that repository:
Example:
node -r esm component/tex2mml \\sqrt{b^2-4ac}
(LaTeX math -> MathML) Example:node -r esm component/am2mml 'sqrt(b^2-4ac)'
(AsciiMath -> MathML)Performance considerations
Executing a program per each formula on site may hamper site generation time. LaTeXML, Pandoc and MathJax have been benchmarked with hyperfine:
Integration considerations
We can call any of these programs from Ruby by creating a subshell. However, it will be very time-consuming for MathJax, and especially for LaTeXML.
Final considerations
We would love to integrate LaTeXML as we have our part in its development, however this seems to be the most difficult of all above. We need to turn it into a gem, and resolve issues with unrecognized symbols. Perhaps in a longer run… unless we have a gem already?