caltechlibrary / irdmtools

A Go and Python package for working with InvenioRDM repositories.
https://caltechlibrary.github.io/irdmtools
Other
0 stars 1 forks source link

doi2rdm: figure out how to transform mml markup #36

Open tmorrell opened 1 year ago

tmorrell commented 1 year ago

Some records like https://doi.org/10.1103/physrevb.85.144303 use mml markup in the descriptions and titles. We need to figure out whether this can be displayed in InvenioRDM or a clever way to clean or transform it into utf8.

rsdoiel commented 11 months ago

I think we can find a MathML to AsciiMath converted, that'd probably do the trick.

https://en.wikipedia.org/wiki/AsciiMath

There appears to be a Python package for working with AsciiMath https://pypi.org/project/py-asciimath/

If not MathJax (JavaScript/TypeScript) can probably do the job, https://www.mathjax.org

This could be done in the Python based fixup code.

tmorrell commented 10 months ago

py-asciimath conversion to LateX might be an initial solution.

We might need to strip out xml stuff https://stackoverflow.com/questions/25952401/declaring-xml-namespaces-for-mathml.

Now the resulting mml as the result should..theoretically render in the browser. But we'd need to figure out how to get RDM to stop transforming the title into characters.

I'd ideally like to just convert to just utf8, but I haven't found a good program since mml is so much more complicated than superscripts and subscripts.

tmorrell commented 10 months ago

Might be able to look at WOS as a source for titles instead