brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

Set lspace and rspace for zero-wdith space mo #2311

Closed dginev closed 9 months ago

dginev commented 9 months ago

A bit of an unfortunate follow-up to #2303 , based on observing that we get the default 0.27...em spacing for any zero-width Unicode chars not yet in the MathML operator dictionary.

See the discussion in mathml-core for more details, as well as an example codepen.

Setting the lspace/rspace attributes explicitly feels like we are reverse-patching the defaults, but without the rendering looking normal, it seems that it causes more harm than good to switch away from invisible times.

xworld21 commented 9 months ago

Now that I see this... why did you go for U+200B as opposed to U+2063? (Conversely, why did I suggest U+2063 in the first place https://github.com/brucemiller/LaTeXML/issues/1941#issuecomment-1783782218? I don't remember.)

xworld21 commented 9 months ago

(Conversely, why did I suggest U+2063 in the first place #1941 (comment)? I don't remember.)

To answer myself: because U+2063 is mentioned in the operator dictionary. U+200B is not.

dginev commented 9 months ago

For U+2063 the assumption is that we know the operator truly was used as a separator, as it would for say delimiting the elements of a permutation cycle. The math-oriented invisible characters in Unicode are meant to imply some certainty of having captured the intention - which may then be used (spoken) by accessibility tools.

On the other hand, a discussion in the MathML Core meetings had broad consensus that the zero-width space U+200B doesn't carry any implied meaning, and could be used as a placeholder. The preference was that we also avoid using an empty <mo/> element, since those are traditionally attributed to errors in XML pipelines. In an effort to please as many perspectives as possible (including ours: we have cases where we don't know what the operators stands for and want it silent in AT), we went for U+200B.

I find all of this to be quite useful, since it helps MathML Core end up with better defaults for more corners of Unicode than it has already covered.