Similarly to the way smartphones auto-correct spelling, there is a common need in arXiv to auto-correct the separation between text and mathematics. In TeX, there is a clear separation of "text mode" and "math mode", which LaTeXML conserves in a HTML vs MathML separation, after conversion.
Similarly to the way smartphones auto-correct spelling, there is a common need in arXiv to auto-correct the separation between text and mathematics. In TeX, there is a clear separation of "text mode" and "math mode", which LaTeXML conserves in a HTML vs MathML separation, after conversion.
There is an existing implementation from a semester project of mine at: https://github.com/KWARC/LLaMaPUn/blob/master/lib/LLaMaPUn/Preprocessor/Purify.pm
The original ticket, with detailed description of progress and various phenomena covered is at the old Trac: https://trac.kwarc.info/lamapun/ticket/1
I should get this ported to C, in order to improve the quality of the input data for our linguistic experiments.