Open xworld21 opened 3 years ago
@dginev I looked at the code, and... I might have implemented the thing I was looking for, in about 20 lines of code. I have leveraged the fact that latexml
detects if the input is a BibTeX file (even if passed as literal:
), adding XML detection was trivial. It's quite a game changer for me if it works correctly – I can tweak the EPUB output without reparsing everything!
My tentative implementation is in this branch, including handling of parsing errors. I have done my best to match the current behaviour, so in principle it behaves well in the client/server scenario. I think the only missing piece is validating the XML document, I thought it would happen in postprocessing, but it doesn't. Also I don't know if --xmlinput
is the right name for it.
Nice that you got a working prototype so quickly @xworld21 !
A bit of a backstory, this capability is one of the two missing pieces before latexmlc
is considered worthy of replacing the latexml+latexmlpost combo as the default recommendation, at least as far as we've discussed this with Bruce.
The second capability is being able to output both the final requested --format, as well as the latexml schema XML which you usually see when running latexml
-proper.
The vision being that you may start with a very challenging TeX manuscript, and convert it to a desired format X, while also saving the internal XML on the way to X. Then, assuming you liked the X you saw, you could quickly reconvert the internal XML to a different format Y. The classic example was to tailor our runs converting all of arXiv, so that I can output my main HTML5 output, and then quickly reuse the XMLs to also do ePub, TEI, JATS, what have you.
So indeed, important missing feature, and it is half of the story to get latexmlc upgraded to an official executable.
The second capability is being able to output both the final requested --format, as well as the latexml schema XML which you usually see when running
latexml
-proper.
I see, indeed, caching the XML result seems like a sensible thing to do, instead of forcing the user to run multiple calls.
So indeed, important missing feature, and it is half of the story to get latexmlc upgraded to an official executable.
As far as I can tell, there is only one way to implement this half of the story within the current latexmlc
, so I'll send a PR with my patch for you to review. It should be equivalent to latexmlpost
now, including validation.
I have not tried to make sense of zip archives. In principle, if you pass --xmlinput
and a zip file, you may expect latexmlc
to search for an XML file instead of TeX, but that means passing the --xmlinput
flag to unpack_source
, changing the heuristic... that's a much bigger change! So my implementation is just as broken as running --bibtex
on a zip file.
Maybe there is a way and I have not figured it out... is it possible to let
latexmlc
take an XML file as input? In other words, to make it work as a replacement oflatexmlpost
?