brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
916 stars 96 forks source link

Adopt Scholarly HTML, or similar standard? #1083

Open bfirsh opened 5 years ago

bfirsh commented 5 years ago

As part of the arXiv project, I have been researching standard dialects of HTML to see if there are ecosystems we can slot into. Scholarly HTML seems the most promising but hasn't had any activity recently and doesn't seem to have got any traction.

This is closely related #896, but I put this in a separate thread to discuss whether another standard could be adopted rather than just documenting what already exists.

The neat thing about the arXiv project is eventually (hopefully) we'll be publishing lots and lots of papers in some HTML format, so whatever we choose to do will gain a lot of traction as a standard. One possible route is to just do nothing, thus making the LaTeXML output a de-facto standard of sorts. 😄

@dginev - I stumbled across some interest from you in the mailing list Have you any thoughts about this?

dginev commented 5 years ago

A range of thoughts... Indeed I would love to generally have a "scholarly HTML standard", or at the very least claim latexml's HTML dialect is "one such standard", by virtue of documenting and justifying its choices. Makes it possible for a lot of services to interoperate.

As to Scholarly HTML itself, I (and I think also Bruce) are in their discussion group and have been reading quietly as the occasional (annual?) discussion takes place. The effort is somewhat unmanned at the moment, or in a different perspective "stable" for what they have already.

My reason for #896 is to be able to fully see what choices latexml made, so that we can quantify how far it is from the Scholarly HTML standard and what are the changes needed and their trade-offs.

As to arXiv, probably good to mention we already have (and have had for years) most of it available in latexml's HTML5+MathML flavor, and arXiv content has indeed informed a lot of the schema work and choices in latexml. May even be accurate to think that latexml's HTML is to a degree representative of arXiv content, more so than Scholarly HTML would be.

I think some bits are easy:

And some are harder:

Will stop this comment here, and see where the discussion heads. I do think documenting the current state is a prerequisite for actual work in this direction though, at least in my mind.