gerby-project / plastex

Python package to convert LaTeX markup to DOM
Other
14 stars 12 forks source link

Elements swallowed by `par` elements #41

Closed chngr closed 6 years ago

chngr commented 6 years ago

In resolving #38, I forced plasTeX to parse theorem environments so that each chunk is split off into a paragraph. This, however, is done with too much indiscretion: objects such as label, reference, and even displaymath elements are now encased in par elements and the output is suboptimal. For instance:

So the main deliverable in this Issue is to ensure reference objects are properly rendered and handled. Note that, due to the par element masking the reference element, -.reference files are not being created at the moment, either. Along the way, we should think about how content in theorem environments should be handled. I still think that, in general, forcing the content in the environment to be paragraphs is an alright idea. Though perhaps the way paragraphs are formed should be reconsidered, or at least, refined.

chngr commented 6 years ago

Notes from tonight:

The reason why the reference elements are being printed is because they are not being assigned filenames. Before–at least so far when there are no multi-paragraph theorem environments which have a reference–reference elements are assigned filenames by lines 67 to 75 in Renderers/Gerby/__init__.py by just taking the reference element, looking at its parent object, which always turned out to be the enveloping thmenv, and then just taking the tag from that.

But when there are multiple paragraphs or when the reference element is contained in a par element, the parent is now this containing par element and no longer the thmenv; this thmenv is one further step up. So a fix to the reference elements not being pulled out, that is, reference elements not being assigned a filename, is to perhaps continually go up parent elements until an enveloping thmenv is found, or at least until it is no longer contained in some par block. This is implemented on my local copy and this does at least fix the issue pointed out by Johan in #38.

Unfortunately, there is now a new spacing/rendering issue that arises: although the reference element does not occupy physical space in the output, it is included in the list of objects to be printed in the thmenv and is typically between a label environment and the first real paragraph. Anyway, the problem that arises is that this false first forces a bit more logic for deciding when the close </p> is to be placed after the initial label. There are two solutions a sleepy me can see right now:

It feels like the second solution will be more robust, though I need to dig around jinja a bit more to figure out how that would work.

chngr commented 6 years ago

This is fixed in commit 0e224ba. The solution I chose was to mark whether or not a child node of a thmenv is visible during a preprocessing stage. With this flag, I can decide in the thmenv template when the first visible object is printed, and then attach the thmenv label to that.