TEIC / Stylesheets

TEI XSL Stylesheets
228 stars 124 forks source link

docx conversion not respecting display="" attribute in math #593

Open Gary-g opened 1 year ago

Gary-g commented 1 year ago

When converting tei documents to docx that contain <math> the script interprets the surrounding <formula> tag as a block element rendering it as a new paragraph. The math tag can have the attribute 'display="block"' or 'display="inline"'. Any inline math is displayed in a new paragraph instead of inline.

sydb commented 1 year ago

I’m a bit confused here — TEI does not have a <math> element, so I am guessing you are using something like

<p>Time to show some MathML. But I don’t know much. So here is a
  copy-and-paste of a simple algebraic expression (from Wikipedia):
  <formula> area =
  <math xmlns="http://www.w3.org/1998/Math/MathML">
    <mi>&#x03C0;<!-- π --></mi>
    <mo>&#x2062;<!-- &InvisibleTimes; --></mo>
    <msup>
      <mi>r</mi><mn>2</mn>
    </msup>
  </math>
  </formula>
  Which means your 10" pizza is twice as big as my 7" pizza.
</p>

If that is the case, I have confirmed that the πr² comes out as a block whether the @display of <math> is set to inline, block, DUCK, or there is no @display at all.

Gary-g commented 1 year ago

That's how I'm doing it. The mathml renders nicely in epub3 (as long as the browser has support - a lot of browsers are using mathjax). "Technically" it's meant to be a part of epub3. It also works correctly with the pdf output.

sydb commented 1 year ago

Hmmm … When I try teitopdf on a file with that snippet I get an inline formula whether the @display of <math> is set to inline, block, BLOCK, or there is no @display at all. (Furthermore the ‘2’ is not superscripted and the π character is not there; the U+2062 is not there either, but I did not expect it.)

So I am suspicious that the problem is not that it is working as you expect in PDF but not HTML, but rather it is not working as you expected in either case (just in different ways).

That said, it is not clear to me what the right behavior is. Are the TEI Stylesheets responsible for heeding an attribute of a child of <formula>? (Even if not, I concede that it would be nice if the behavior were at least consistent. :-)

Gary-g commented 1 year ago

Perhaps formula should have a "display" attribute, that way tei could remain agnostic. I've focused on mathml with EPUB because with a good browser it actually works.

Gary-g commented 1 year ago

Curious. What fop are you using with the pdf? Your mathml also has one small error (not sure if it would make a difference to the output) the "area=" needs to be tagged or outside the formula element

try

area = π r2
sydb commented 1 year ago

I like the idea of indicating whether a formula should be displayed as a block or inline using an attribute on <formula>. But since we already have @rend, @style, @rendition, and the entire “processing model” encoding system for (a superset of) that purpose, I would be hesitant to add another attribute.

The content of <formula> is defined by TEI P5 as

( text | model.graphicLike | model.hiLike )*

It is the tei_math customization that constrains <formula> to only MathML, no text. (And thus the tei_allPlus customization does this, too, as it just grabs the definition from tei_math.)

BTW, as I went to check which customizations re-define the content model of <formula>, I discovered that the definition of <formula> in tei_simplePrint is an inline element unless the value of @rendition is set to "simple:display", in which case it is a block element.

Gary-g commented 1 year ago

If the default for all the outputs was inline would the @rendition value flow through to pdf and docx? Another way to handle it might be to make <formula> default to inline across all outputs and wrap it in a <figure> to force a block behavior.

bwbohl commented 1 year ago

I'm absolutely no expert in encoding TEI+MathML, but in my work context, I always try to come up with a consistent encoding. In my case, it's mostly editing guidelines that we encode in TEI and in those, we have many examples from the original sources or code examples or a combination of both. As a solution, we went towards using tei:figure and tei:eg a lot, and I wonder whether it wouldn't be a solution to wrap the mathml:formula in a tei:figure for the block cases. Doing so would bring along the capability to use @rend | @rendition | @style.