brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

preserve xhtml:attrs in XSLT #2166

Closed dginev closed 1 year ago

dginev commented 1 year ago

This is a PR-shaped question to @brucemiller, while brainstorming on #2165 .

It is generally quite simple to write a small binding that deposits ARIA attributes for latexml, except that those attributes are not declared in the schema, and aren't allowed to pass through in the XSLT post-processing.

We could allow each and every one of the list of ARIA attributes in the ltx namespace, but - in purely pragmatic terms - such a requirement limits experimentation for lay binding authors a bit. I myself find it somewhat uncomfortable to be buffered through the latexml schema, before being able to experiment with emitting the ultimate HTML markup. Going back to enhance the schema after a stable HTML markup pattern emerges feels slightly more natural to exploratory coding.

The PR contains the first quick upgrade while wearing my HTML hat - allow any xhtml:attribute to pass through the XSLT as a plain attribute in the final (X)HTML, in the same conditional check that currently emits data- attributes as a fallback.

This allowed to write the binding:

DefEnvironment("{namedblock}{}{}", "<ltx:block xhtml:aria-label='#1' xhtml:aria-description='#2'>#body</ltx:block>");

author:

\begin{namedblock}{sample-value}{describe it}
  Hello World!
\end{namedblock}

and emit to HTML5:

<div class="ltx_block" aria-description="describe it" aria-label="sample-value">
<p class="ltx_p">Hello World!</p>
</div>

More generally, passing arbitrary xhtml:names along allows direct experimentation with (X)HTML attributes of any variety, bypassing that aspect of latexml's schema.


In addition, maybe in a next PR (or next commit), if one was to be Principled in the XML sense, we'd follow the WAI-ARIA host languages instructions and declare an aria namespace at the official URI http://www.w3.org/ns/wai-aria. Then I'd need a dedicated XSLT rule that maps aria: namespaced values into aria- values in HTML5, as we do for data-.

I believe this simplest PR can be useful for general quick experimentation, and I can bring along the namespaced aria approach in addition to that.

How much of that aligns with your perspective here @brucemiller ?

dginev commented 1 year ago

I should also mention that the master branch currently uses the data- check for the xhtml namespaced attributes, creating the rather amusing variation on my example:

<div class="ltx_block" data-xhtml-aria-label="sample-value">...</div>
brucemiller commented 1 year ago

Unfortunately, you've caught me in a mood quite resistant to adding open-ended but special-purpose backdoors. They tend to make future development, testing, refactoring and cleanup very difficult because you never know how they've been used; you can't change anything. And additionally, when the original "feature" does get implemented, you've got two incompatible ways of doing the same thing.

If I were inclined to go in this direction, I'd be thinking (since I do not approach things "html-centric"), that if the attribute's namespace is the same as the target namespace (whether xhtml, jats, whatever), you might simply copy the attribute w/o the namespace. Ah, but what if you did want the attribute namespaced? Which applies for xhtml as well, however unlikely that may seem at this particular moment.

OTOH, if accessibility is the objective, we should probably spend some time thinking about how to do that correctly, but pragmatically. Will we ever have more than a "description" of an object coming from our source documents? Do we need aria in all its full glory to make use of that description? That is, we probably will place the description into html using some set of aria attributes, but does LaTeXML's XML need aria to do that? Or is an appropriately placed @description attribute (or element) suffice?

dginev commented 1 year ago

A technical note I noticed this morning, this kind of experimentation is already possible if it is done only as HTML, without inter-operating with the native schema. Namely this:

DefEnvironment("{ariadiv}{}{}", "<ltx:rawhtml><xhtml:div aria-label='#1' aria-description='#2'>#body</xhtml:div></ltx:rawhtml>");

gets successfully carried into (X)HTML, thanks to the recently fixed #2148.

brucemiller commented 1 year ago

Given that xhtml is a "known" namespace, and that xhtml:bar="stuff" is different than bar="stuff", blindly converting the former to the latter is likely to be a rude surprise at least as often as it will be "helpful".

OTOH, since we've introduced a Special data namespace to be converted to data-xxx attributes, perhaps it isn't too inconsistent to introduce a Special passthru (or something similarly suggestive) namespace that is treated roughly as you suggest? (ie. attribute is copied as local-name). If and when it ever gets documented, we can claim that it is to support experimentation rather than production code... if that matters.

dginev commented 1 year ago

Most of this sounds like a code smell to me. Of course courtesy of XML's design which makes the simple and intuitive way of annotating something adjacent to "rude surprises", rather than helpful intuitive behavior.

Irrespective of this PR, LaTeXML should be able to pass along any xhtml: namespaced attribute into HTML5, which it currently mangles, as I showed above.

But I think there is more important work than this for now, closing here.

brucemiller commented 1 year ago

Irrespective of this PR, LaTeXML should be able to pass along any |xhtml:| namespaced attribute into HTML5, which it currently mangles,

No, since you have USE_DATA_ATTRIBUTES true, it did exactly what you told it to do.

If, on the other hand, you want the ability to say that some attributes (or namespaces) get turned into data attributes, some keep their namespace, some drop their namespace, etc, etc, then you've got to have a mechanism to specify what you want, without too many short signed assumptions.