AmerMathSoc / texml-to-html

Converting AmerMathSoc/texml output to raw HTML
Apache License 2.0
3 stars 2 forks source link

formula.js: consider leaving tags in the document #425

Closed pkra closed 1 year ago

pkra commented 1 year ago

Via https://github.com/AmerMathSoc/ams-html/issues/938

In #422 we added extraction into data attributes. However, it might be easier and better to "just" keep the tags in the document.

pkra commented 1 year ago

Note: one problem will be that formulas inside tag would be turned into $...$ which is what we'd want to avoid.

pkra commented 1 year ago

If we make the element hidden, we could keep the old approach and remove it later. That is, avoid a breaking change.

pkra commented 1 year ago

Some notes from an experimental implementations

Since ams-eqn-store reads the innerHTML of formula.js "output", we need to either put the tags outside of it or change ams-eqn-store.

I tried to just an element with the tags after which didn't go well on the ams-html side -- the math panel creation has trouble because it is working with cloned nodes, so the sibling is tricky to find. Working around it temporarily, I ran into odd errors I couldn't track down - at which point I decide to stop because it was getting too hacky.

So this seems to require a somewhat larger change throughout the toolchain. It might be that some things get simpler in the long run (e.g., making formula elements plain wrappers and having tex-math and tags as children). But it needs more investigating.

pkra commented 1 year ago

Here's the diff of where I had left things on this end:

diff --git a/lib/elements/formula.js b/lib/elements/formula.js
index dfb3274..22a43a4 100644
--- a/lib/elements/formula.js
+++ b/lib/elements/formula.js
@@ -12,12 +12,15 @@ export default function (htmlParentNode, xmlnode) {
     htmlParentNode.insertAdjacentText('beforeend', '$');
     return;
   }
+  const hasLinkedTag = xmlnode.querySelector('target tag');
+  const tagContainer = `<div hidden data-ams-doc="tags"></div>`; //NOTE we store copies of tags for easier re-use downstream; cf. tags.js 
   if (xmlnode.getAttribute('content-type') === 'text') {
     const div = this.createNode('div', '', {
       'data-ams-doc': `math text`
     });
     htmlParentNode.appendChild(div);
     this.passThrough(div, xmlnode);
+    if (hasLinkedTag) div.insertAdjacentHTML('afterbegin', tagContainer)
     return
   }
   // Otherwise
@@ -28,6 +31,7 @@ export default function (htmlParentNode, xmlnode) {
   htmlParentNode.appendChild(span);
   if (mathMode === 'block' && xmlnode.querySelector('tex-math[has-qed-box]'))
     span.setAttribute('data-ams-qed-box', 'true');
+  if (hasLinkedTag) span.insertAdjacentHTML('afterend', tagContainer);
   this.passThrough(span, xmlnode);
   // NOTE we (sometimes?) get extra whitespace from childnodes; needs test
   const text = span.innerHTML;
diff --git a/lib/elements/tag.js b/lib/elements/tag.js
index 4a0545b..397c31c 100644
--- a/lib/elements/tag.js
+++ b/lib/elements/tag.js
@@ -27,4 +27,8 @@ export default function (htmlParentNode, xmlnode) {
         if (xmlnode.getAttribute('parens') === 'yes') span.insertAdjacentText('beforeend', ')')
         htmlParentNode.prepend(span);
     }
+    // tag extraction 
+    // if some tag in the equation is linked, we store all tags in a special element in the HTML parent
+    const tagsElement = htmlParentNode.closest('[data-ams-doc~="math"]').nextElementSibling;
+    if (tagsElement?.getAttribute('data-ams-doc') === 'tags') tagsElement.insertAdjacentHTML('beforeend', `<span>${this.passthroughIntoHTMLString(xmlnode.cloneNode(true))}</span>`);
 };
pkra commented 1 year ago

Here's a radical thought: let's preserve (outer) tex-math, i.e., instead of

<span data-ams-doc="math block">\tag{$x$}</span>

something like


<span data-ams-doc="math block">
  <tex-math>\tag{$x$}</tex-math>
  <span hidden data-ams-doc="tags">
    <span>
       <span data-ams-doc="math inline">
         <tex-math>x</tex-math>
      </span>
    </span>
  </span>
</span>
``

(which is a bit convoluted when there's math in the tag but still)
pkra commented 1 year ago

Fun fact: we don't handle "text equations" properly in ams-html (createMath doesn't pick up on them).

pkra commented 1 year ago

We don't have to collect tags in text equations because their tags come out as spans with inline math already (well, if we had any with inline math in their tags).

pkra commented 1 year ago

This seems to be working out ok downstream

I haven't looked into all test failures here - a ton of changes are expected and I'd like to take the opportunity to clean up the formula-related tests.