Speech-Rule-Engine / speech-rule-engine

Generating speech descriptions for XML structures
https://zorkow.github.io/speech-rule-engine/
Apache License 2.0
75 stars 39 forks source link

mfenced doesn't take mstyle into account #17

Open dpvc opened 9 years ago

dpvc commented 9 years ago

The MathML

<math display="none">
  <mstyle open="[" close="]" separators="+">
    <mfenced>
      <mi>a</mi><mi>b</mi>
    </mfenced>
  </mstyle>
</math>

produces the enriched form

<math xmlns="http://www.w3.org/1998/Math/MathML" display="none">
  <mstyle separators="+">
    <mfenced type="fenced" role="leftright" id="6" children="5" content="3,4">
      <mo type="fence" role="open" id="3" parent="6" added="true" operator="fenced">(</mo>
      <mrow type="punctuated" role="sequence" id="5" children="0,2,1" content="2" parent="6">
        <mi type="identifier" role="latinletter" id="0" parent="5">a</mi>
        <mrow type="punctuation" role="comma" id="2" parent="5" operator="punctuated" />
        <mi type="identifier" role="latinletter" id="1" parent="5">b</mi>
      </mrow>
      <mo type="fence" role="close" id="4" parent="6" added="true" operator="fenced">)</mo>
    </mfenced>
  </mstyle>
</math>

which has the wrong delimiters and the wrong separator. Note also that the <mstyle> element loses the open and close attributes.

zorkow commented 9 years ago

mstyle is currently simply ignored. It's a bit odd that it looses two attributes and is should not be changed at all.

It is currently also not taken into account for the semantic tree computation. This explains the fences and separator. I have to have a full understanding what mstyle can do and then tackle its treatment in the tree. (Another one of my corpses...)

dpvc commented 9 years ago

You're not going to like mstyle, and it will be a pain to handle it. You would have to look up the tree for mstyle nodes any time you wanted to process an attribute (since it could be used to provide that, as in the example above). But some attributes aren't supposed to be inherited this way. It is quite a mess, really.

I might be able to modify toMathML() to add the attributes that mstyle would normally have provided (since the inheritance code is already part of the internal MathML objects). Then you wouldn't have to worry about the inheritance. But then the result will have redundant attributes when mstyle is used.

zorkow commented 9 years ago

No let's not modify toMathML. I feel some of those attributes change semantic meaning of expressions and hence have to be handled correctly. In either case attributes should not vanish from the mstyle element.

Is the following understanding of mstyle too simplistic?

mstyle governs the attributes of the entire subtree it roots and -- if the element accepts the attribute (e.g., open, close, etc. only work on mfenced) -- unless the attribute is overwritten by another mstyle or locally in the element.

But some attributes aren't supposed to be inherited this way.

Now that sounds worrying. Which one are these? Is there a list somewhere?

Finally, as a first iteration, I believe I will start dealing with: open, close, separators, mathvariant. Is there anything else you feel is immediately necessary?

dpvc commented 9 years ago

mstyle governs the attributes of the entire subtree it roots and -- if the element accepts the attribute (e.g., open, close, etc. only work on mfenced) -- unless the attribute is overwritten by another mstyle or locally in the element.

This is close, but it is a bit more complicated. Some elements act as containers, and when they override a setting set in mstyle, the overridden value applies to its children. Other elements override mstyle without overriding the value for its children. This is described in the mstyle section of the MathML spec (the three bullet points give the details). Which category an attribute falls into is listed in the attribute tables for each MathML element.

But some attributes aren't supposed to be inherited this way.

Now that sounds worrying. Which one are these? Is there a list somewhere?

The mstyle section of the MathML spec has a list of them in the third paragraph after the bullet list describing three inheritance cases.

It's all a bit complicated and confusing. I implemented the inheritance rules back in the early days of MathJax (2008) and the internal MathML objects have a Get() method that returns an attribute's value taking into account explicit attributes on the element, inherited ones from containers and mstyle elements (or the math element itself), and the default values specified in the spec. So I don't have to worry about how it all works any more. But since you are working with a serialized version of the MathML rather than the MathJax internal form, you will have to re-implement all that yourself (or enough of it to handle the attributes you need).

No let's not modify toMathML.

You may want to reconsider that. :-)

What I have in mind is an option that is turned on for the serialization created for SRE, and then turned off, so the normal toMathML() output will not be changed. But you would get the inherited values added to the elements, so you don't have to worry about the inheritance issues yourself. I suspect it is not all that hard.

I believe I will start dealing with: open, close, separators, mathvariant. Is there anything else you feel is immediately necessary?

Well, anything can be set this way, so things like linethickness in mfrac, and accent in mover or its core mo may need to be checked. But these are a good start for now.

zorkow commented 9 years ago

It's all a bit complicated and confusing.

Quite. Inheritance of open, close etc. I would have never inferred from the mstyle section. Mathvariant makes sense. the others not so much.

What I have in mind is an option that is turned on for the serialization created for SRE, and then turned off, so the normal toMathML() output will not be changed. But you would get the inherited values added to the elements, so you don't have to worry about the inheritance issues yourself. I suspect it is not all that hard.

I feel that SRE should come up with the "correct" interpretation from any MathML source, not just from MathJax. However, for this project as well as for testing to check that we indeed get equivalent structures either way, such an option would indeed be very useful, provided it is not too much work.

I believe I will start dealing with: open, close, separators, mathvariant. Is there anything else you feel is immediately necessary?

Well, anything can be set this way, so things like linethickness in mfrac, and accent in mover or its core mo may need to be checked. But these are a good start for now.

Accent is indeed important, so I should handle that straight away. linethickness I would not know what to do with. How do we interpret lines of different thickness semantically?

Btw. regarding mathvariant: I don't think I've ever added the font in the semantically enriched output. It is in the semantic tree, derived either from mathvariant (also not yet mstyle) or the Unicode character. How would mathvariant work on a Unicode character of a particular font (e.g. frak on a double-struck character)?

pkra commented 9 years ago

I feel that SRE should come up with the "correct" interpretation from any MathML source, not just from MathJax.

Perhaps for MathJax 3.0 we could make the existing technology inside MathJax more re-usable.

How do we interpret lines of different thickness semantically?

I think one very important use case is 0 thickness on mfrac which is frequently used for binomial coefficients.

zorkow commented 9 years ago

I think one very important use case is 0 thickness on mfrac which is frequently used for binomial coefficients.

I see. Nasty... We do have a special semantic role for binomial coefficients, but that only fires if we have a bracketed expression. Otherwise I guess it will just end up in some over or undersript.

pkra commented 9 years ago

Nasty...

Yup. Straight out of the spec (below the attribute table), I'm afraid.

dpvc commented 9 years ago

Inheritance of open, close etc. I would have never inferred from the mstyle section. Mathvariant makes sense. the others not so much.

I guess the phrase "[mstyle] can be given any attribute accepted by any other presentation element, except for the attributes described below" in the first paragraph is the one that indicates that this can happen. I doubt that open, close, separators, etc. will ever be used in mstyle, but they are allowed by the spec.

for this project as well as for testing to check that we indeed get equivalent structures either way, such an option would indeed be very useful, provided it is not too much work.

I'll look into it. Can you give me a list of the attributes that SRE actually uses? No need for toMathML() to look up values for things you never use.

How do we interpret lines of different thickness semantically?

Peter has already given you the case that I had in mind: linethickness="0", and his example of binomial coefficients are the key example. These will be bracketed, in general, but fractions with linethickness="0" can be generated in other ways. MathJax's implementation of TeX's \atop and \atopwithdelims produce mfrac with zero line thickness, for example.

How would mathvariant work on a Unicode character of a particular font (e.g. frak on a double-struck character)?

For characters in the Math Alphabet unicode block, mathvariant should be ignored. The mathvariant section of the spec gives most of the important information, and states that "the appearance of a mathematical alphanumeric symbol character [U+1D400 to U+1D7FF] should not be altered by surrounding mathvariant or other style declarations" at the end of the fifth paragraph after the table in the section linked above.

Another rule that you might consider: if mathvariant is set, then style values for font-family, font-weight, font-size, and the deprecated attributes fontfamily, fontweight, and fontsize should be ignored. I don't know if you look at the deprecated font* attributes, or the style attribute, but MathJax does process these for its output.