When we report data to crossref, we need to properly encode mathematics in titles, abstracts, and references. Crossref appears to accept both TeX and MATHML in abstracts and titles, but their examples use mathml. If we encode math as LaTeX then we have to be careful to put it into a CDATA XML section because it may contain characters that are problematic in XML, namely <, >, &. It seems that there is a python converter latex2mathml from LaTeX to MATHML, but if you run it on the entire title, it seems to produce junk, because convert('This is text') will produce
It's possible to just detect the sections of the title and abstract that are in math mode, and convert those to mathml. Alternatively, we can try to encode the math sections as CDATA sections. I looked at what others do, and I found some bad behavior:
Springer encodes mathematics in titles inside $$ ... $$ instead of $ ... $. See this example
The safest thing is perhaps to encode the mathematics sections as LaTeX in CDATA. Unfortunately python ElementTree does not directly support CDATA output. See this.
Whatever we do, this will require extensive testing.
When we report data to crossref, we need to properly encode mathematics in titles, abstracts, and references. Crossref appears to accept both TeX and MATHML in abstracts and titles, but their examples use mathml. If we encode math as LaTeX then we have to be careful to put it into a CDATA XML section because it may contain characters that are problematic in XML, namely <, >, &. It seems that there is a python converter latex2mathml from LaTeX to MATHML, but if you run it on the entire title, it seems to produce junk, because
convert('This is text')
will produceIt's possible to just detect the sections of the title and abstract that are in math mode, and convert those to mathml. Alternatively, we can try to encode the math sections as CDATA sections. I looked at what others do, and I found some bad behavior:
The safest thing is perhaps to encode the mathematics sections as LaTeX in CDATA. Unfortunately python ElementTree does not directly support CDATA output. See this.
Whatever we do, this will require extensive testing.