brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
917 stars 97 forks source link

[Enhancement] newenvironment should result in div or class #1835

Closed FelixBenning closed 1 year ago

FelixBenning commented 2 years ago

Solution Environments of this type

\newenvironment{solution} {\begin{proof}[{Solution}]} {\end{proof}}

do not appear in the final html document. In their place are simple proof divs

<div class="ltx_proof">
<h6 class="ltx_title ltx_runin ltx_font_italic ltx_title_proof">Solution.</h6>

This means I can not add CSS or javascript to these custom environments (like hiding them), or remove them from the html tree by selecting them for example with the Beautiful Soup python package and deleting them from the tree.

Any \newenvironment command should probably create another div wrapping. Or add a class to the child div to avoid matrioshka divs.

dginev commented 2 years ago

This is a very reasonable request, with some clear benefit. I think we should upgrade with at least an extra class attribute for the environment name.

The problem with adding a wrapping div at all times is that we need to stay alert for cases that will not be self-contained trees. One could use \newenvironment to affect behavior-only changes (e.g. bump up a counter, or record contents in some registers). Bravely adding wrapping elements can lead to errors in e.g. arXiv documents.

But we could design something slightly clever and check if the environment's final construction is a single node - in which case we can add the class value with the environment name.

XSLT can add some extra divs in special cases where they're needed, so it may help to get a fully fleshed out example or two.

FelixBenning commented 2 years ago

While it might be that people abuse the \newenvironment macro for for counters, etc. it has a clear beginning and clear end (latex does not like it if you do not use the \end{} command right?) so the content inside is somehow a unit. So it seems like you should be able to wrap it right?

In fact it would also be nice if one could use Splitting on all environments. As an example I created a bunch of exercises (newtheorem environments) and in order to serve one exercises at a time it would be nice if they were already a single file each. As a workaround I will split out those exercises into individual tex files and use a different main tex file (compared to the exercise sheets) to generate the html.

dginev commented 2 years ago

So it seems like you should be able to wrap it right?

In "disciplined" use of LaTeX yes. Sadly, quite a lot of documents are not disciplined. It took me 1 minute to find an example from arXiv, which is simplified to the lines of:

\documentclass{article}
\usepackage{xcolor}
\newenvironment{myred}{\color{red}}{\color{black}}

\begin{document}

\myred
  \begin{itemize}
    \item one
\endmyred
    \item two
  \end{itemize}

\end{document}

Here a wrapping <div> for red would hit an invalid collision with the <ul> element for the itemize.

I definitely agree we should add a class where possible (ltx_env_myred or such), but I also believe latexml should study the constructed XML contents of the environment body, and only when there is a sensible element, then we should annotate it with a class. Auto-opening a wrapping element for \newenvironment is bound to lead to tree collisions in arXiv rather quickly.

dginev commented 2 years ago

Btw, this is the same kind of difficulty we discussed in #1711 in relation to marking column layout, which can (and does) collide with the trees of narrative layout.

The TeX engine has no constraint to build well-formed trees as output, and authors take advantage of that. So we need to dance a careful dance that provides some upgrades for disciplined use, while also successfully converting TeX kernel black magic.

FelixBenning commented 2 years ago

Thank you for the explanation :)

brucemiller commented 2 years ago

If the issue is different kinds of proofs, have you looked at using \newtheorem (or one of its many variants)? Those preserve the (minimal) semantics of being a "proof" of some sort, and also add the type as a class, so you can style as you like. For example

\newtheorem{Solution}{Solution}

will give you a Solution environment, with titled with "Solution", and with class="ltx_theorem_Solution".

As to environments in general: A LaTeX environment sets a context, a TeX grouping. As such, they make it easy to create some kind of Object (table, equation, theorem,...), and that's what they often (usually?) are used for. But they don't necessarily create an Object, or even any output at all. It's not at all the case that not creating an object is any kind of "abuse".

Consequently, in general LaTeXML doesn't know if there's an element to add a class to, or which element it might be. It is conceivable to make the macros defined by \newenvironment do some heuristic sniffing around, but maybe error prone.

brucemiller commented 1 year ago

I took another look around, and really can't see a safe way to have LaTeXML automatically sneak in adding a class for arbitrary \newenvironment that doesn't have a high risk of interfering with the expansion of the begin or end code. An environment is really just a shorthand for two macros within a TeX group and used for many purposes besides creating a block, and they are not at all "abuse".

For theorem-like blocks, using \newtheorem will automatically give you what you want. In other cases, where you know you have an appropriate block, you could use \usepackage{latexml}, and then define your solution as:

\newenvironment{solution}{\begin{proof}[{Solution}]\lxAddClass{solution}}{\end{proof}}

(or even define your own \newblockenvironment to do that). Or perhaps etoolbox has tools that might help (although the timing could be off).

So, there are a few ways to get the effect you want, but I don't think it's really feasible to do that for all environments. Thanks for the report, though!