brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
958 stars 101 forks source link

koma-script support #915

Open asmaier opened 6 years ago

asmaier commented 6 years ago

The package https://ctan.org/pkg/koma-script is widely used for publications with European layout (see https://tex.stackexchange.com/questions/7742/what-are-the-strengths-and-weaknesses-of-koma-script-and-memoir) . It would be nice to see at least some basic support for the classes https://ctan.org/pkg/scrartcl, https://ctan.org/pkg/scrreprt and https://ctan.org/pkg/scrbook in latexml.

brucemiller commented 6 years ago

Yep, both of these groups would be very interesting to cover. But they're both complex; there's a lot of ignorable styling stuff, but also a lot of semantically relevant markup that would need bindings. A good set of sample documents would help.

dginev commented 5 years ago

[0.8.4 note] Pushing milestone back on non-urgent issues.

Kreijstal commented 3 years ago

What are bindings?, how can this be helped?

tkw1536 commented 3 years ago

What are bindings?,

When using a LaTeX package or class, LaTeXML needs to know which macros contain semantic information. These typically require special treatment to be properly maintained in the result document. Other macros are "only" stylistic and can just be treated like native LaTeX would treat them. For example, when writing \href, this should be turned into <a href="...">. However when writing \% this should just turn into a percent sign and not be turned into a separate tag.

A binding is a file ending in .ltxml that defines this information for a specific package. Internally bindings are just perl code.

how can this be helped?

Someone needs to go over the koma-script package (and related packages); figure out which macros are semantic and which ones are not. Then an appropriate binding can be written.

This can be done purely based on the documentation. But as above (emphasis mine):

Yep, both of these groups would be very interesting to cover. But they're both complex; there's a lot of ignorable styling stuff, but also a lot of semantically relevant markup that would need bindings. A good set of sample documents would help.

dginev commented 3 years ago

Another honest note is that we need developer focus on this particular package to get anywhere -- and currently the koma-script features fall into the unfortunate basket of "complex task with no immediate payoff for our direct projects", which makes it easy to keep on the back-burner. I've pushed back the milestone to make that a little more visible here.

The best one can do is contribute the binding for us by developing it themselves and submitting a PR, but koma-script is probably best left to a seasoned binding author given the level of difficulty.

Also, I've grown to dislike the summary of "semantic macros need bindings", which is largely besides the pragmatic point of bindings. We need a map to construct a document in latexml's internal schema (e.g. to eventually create HTML5), and we need to write more code whenever that map is not yet implemented. In cases where the existing latexml support for the tex engine and the latex kernel suffices, you could load a class/style file raw already and do well.

But for advanced features - e.g. all of the non-traditional metadata and styling that KOMA offers, we need to go through the dance of:

  1. decide if we want to support a given mini feature, or stub it as a "no-op" for latexml
  2. if supported, see if the current latexml schema has an appropriate target markup, or the schema needs an extension
  3. once we have a schema target, write the perl code necessary to pass the information through to the XML document, which needs to respect the TeX processing flow and the latexml document construction flow.
    • and since we can't emulate all TeX, here is usually the point where you may choose to reimplement some TeX algorithms in pure perl, on a case-by-case basis
  4. Finally, add custom post-processing rules if this requires a specialized dialect of HTML, potentially also adding some new CSS definitions.
  5. P.S. add a few tests, so that we avoid regressing once the binding is queued in for review

So, and this is very similar to the situation with our beamer.cls issue #231 , we need to answer a range of questions before any code can start getting written. Which is also why it's so hard to start working on the larger latex classes and styles, one really has to "embark" on a project to get anywhere.

dginev commented 2 years ago

Example documents from arXiv can be found at: