[port] Modality Purification in C

dginev commented 10 years ago

Similarly to the way smartphones auto-correct spelling, there is a common need in arXiv to auto-correct the separation between text and mathematics. In TeX, there is a clear separation of "text mode" and "math mode", which LaTeXML conserves in a HTML vs MathML separation, after conversion.

There is an existing implementation from a semester project of mine at: https://github.com/KWARC/LLaMaPUn/blob/master/lib/LLaMaPUn/Preprocessor/Purify.pm

The original ticket, with detailed description of progress and various phenomena covered is at the old Trac: https://trac.kwarc.info/lamapun/ticket/1

I should get this ported to C, in order to improve the quality of the input data for our linguistic experiments.

dginev commented 10 years ago

P.S. And an auto-correct for spelling is a likely related feature that currently no one has looked into.

dginev commented 10 years ago

Also to be added: https://trac.kwarc.info/lamapun/ticket/41

KWARC / deprecated-LLaMaPUn

[port] Modality Purification in C #2