daisy / math-a11y

5 stars 2 forks source link

MathML Copied from JAWS 2024's Math Viewer with MathCAT Will Not Auto Convert Into Math Objects in Microsoft Word 365 #4

Open brichwin opened 4 months ago

brichwin commented 4 months ago

eDAD tracking ticket for this issue is Accessibility 228596

Description:

Math copied from the Math Viewer in JAWS will not automatically convert into a math object in Microsoft Word 365. Instead, the result is that the MathML for the expression pastes in as plain text. It can be observed that the MathML copied to the clipboard lacks an appropriate xmlns attribute on the <math> element.

Steps to Reproduce:

  1. Open Microsoft Word 365 (Desktop version) on Windows 11.
  2. Create a new [blank] document.
  3. Launch JAWS
  4. Use the Early Adopter Program dialog to ensure that MathCAT is enabled.
    • Dialog is available from "Options" > "Early Adopter Program..."
    • Restart JAWS if necessary
  5. Visit a web page that contains math expressions encoded as MathML
  6. Navigate to a math expression
  7. Press "Enter" to bring up the "Math Viewer"
  8. Press "Ctrl+C" to copy the math expression
    • You should hear "Copied selection to clipboard"
  9. Switch back to the blank Microsoft Word document
  10. Press "Ctrl+V" to paste the math expression into the Word document
  11. Note that instead of a Microsoft Math Object that represents the expression copied in step 8, that the raw MathML for the expression was pasted in as plain text.
  12. Also note that the MathML that was pasted in does not have a xmlns='http://www.w3.org/1998/Math/MathML' attribute on the <math> element.

Expected Behavior:

When pasted into Microsoft Word, the MathML should convert into a Microsoft Math Object for the original expression that can be edited using the Equation Editor.

Observed Behavior:

Version Information:

Example Video and Sample Web Page with MathML content:

A link to a short example video demonstrating the issue and a link to the web page used to demonstrate the issue are available below:

Attachments:

Additional Context: The xmlns='http://www.w3.org/1998/Math/MathML' attribute on the <math> element is optional for MathML contained in an HTML5 document. When MathCAT in NVDA copies math to the clipboard, it appears to ensure that the <math> element contains the correct xmlns attribute regardless of if the original <math> element had one or not.

Here is the MathML that was generated when copied using the JAWS Math Viewer with MathCAT enabled:

 <math id='Mpcd3zh6-0' data-id-added='true'>
  <mrow data-changed='added' id='Mpcd3zh6-1' data-id-added='true'>
    <msqrt id='Mpcd3zh6-2' data-id-added='true'>
      <mrow data-changed='added' id='Mpcd3zh6-3' data-id-added='true'>
        <mi id='Mpcd3zh6-4' data-id-added='true'>x</mi>
        <mo id='Mpcd3zh6-5' data-id-added='true'>+</mo>
        <mn id='Mpcd3zh6-6' data-id-added='true'>1</mn>
      </mrow>
    </msqrt>
    <mo id='Mpcd3zh6-7' data-id-added='true'>=</mo>
    <mn id='Mpcd3zh6-8' data-id-added='true'>3</mn>
  </mrow>
 </math>

Note that if the above MathML is copied and then pasted into Microsoft Word, it becomes plain text of the MathML code: Screen shot of Microsoft Word where a bunch of MathML code rendered as paragraph text is visible.

Here is the MathML that was generated from the same expression when copied using MathCAT in NVDA:

<math xmlns='http://www.w3.org/1998/Math/MathML'>
 <mrow>
  <msqrt>
    <mrow>
      <mi>x</mi>
      <mo>+</mo>
      <mn>1</mn>
    </mrow>
  </msqrt>
  <mo>=</mo>
  <mn>3</mn>
 </mrow>
</math>

Note that if it is copied and then pasted into Microsoft Word it becomes a Microsoft Word Math Object: Screenshot of Microsoft Word where an active Microsoft Equation is visible.

jkhurdan commented 3 months ago

I was able to reconstruct the issue. I used Freedom Scientific MathML Example page as my reference MathML code.

brichwin commented 3 months ago

I'm wondering if this issue should be sent to both Freedom Scientific and to Microsoft?

I wonder if @NSoiffer has an informed opinion to guide what we should do here?

brichwin commented 3 months ago

Via pasting in the two mathML examples above, I can confirm that the same behavior is happening on Microsoft® Word for Mac, Version 16.87 (24071426), License: Microsoft 365 Subscription.

jkhurdan commented 3 months ago

I personally like the idea of sending to both. I think we should approach it from the perspective that other tools like equatio, mathpx etc may also operate similarly where they are only copying a portion of the mathml. (I haven't tested those however, just giving as theoretical examples.)

(Also I tried to do this with VO- know its outside the scope of this group, but I couldn't find a way to copy MathML with VO as you can with JAWS/NVDA).

GeorgeKerscher commented 3 months ago

I too think it should go to both. We can confirm that the Freedom pages with MathML work properly with NVDA, right?

NSoiffer commented 3 months ago

Sorry for the slow response -- just catching up from a long vacation...

FYI: in a web environment, "math" is a known tag and doesn't require a namespace element. Same for "svg". HTML5 is not XML and so it ignores namespaces.

The reason the copy/paste works from NVDA is because MathCAT puts the MathML out onto the clipboard using multiple "flavors". One of the flavors is Unicode, but the key for Word is that another flavor that is used is "MathML Presentation". There is also the more generic "MathML" flavor. MathCAT puts both MathML flavors out because an application might only look for one of them. Word sees the MathML flavor and then knows that the clipboard contains MathML. Otherwise, it thinks you want to paste the clipboard contents as text. So the key is to get JAWS to add the flavor when it puts things on the clipboard.

From the NVDA code: First you need to register the format as it isn't a Windows known standard:

    CF_MathML = windll.user32.RegisterClipboardFormatW("MathML")
    CF_MathML_Presentation = windll.user32.RegisterClipboardFormatW("MathML Presentation")

When you do a copy, the code is

            with winUser.openClipboard(gui.mainFrame.Handle):
                winUser.emptyClipboard()
                if is_mathml:
                    self._setClipboardData(self.CF_MathML, '<?xml version="1.0"?>' + text)
                    self._setClipboardData(self.CF_MathML_Presentation, '<?xml version="1.0"?>' + text)
                self._setClipboardData(winUser.CF_UNICODETEXT, text)

If the application doesn't know about MathML flavors, the last line serves as a text fallback.

On the Mac, an analogous thing needs to be done, but the details differ. I haven't coded this for the Mac, so I don't know the details other than that the MathML spec says to use public.mathml.presentation and public.mathml.

Bottom line: the solution is not about namespaces -- it is to set up the clipboard format properly.

brichwin commented 3 months ago

It's always amazing how much Neil knows and how much I don't know that I don't know!

... the key for Word is that another flavor that is used is "MathML Presentation". There is also the more generic "MathML" flavor. MathCAT puts both MathML flavors out because an application might only look for one of them. Word sees the MathML flavor and then knows that the clipboard contains MathML. Otherwise, it thinks you want to paste the clipboard contents as text.

Some questions that are all basically "Should not request Microsoft investigate changing the behavior of Word?":

  1. If I type a MathML expression with the xmlns attribute into notepad.exe on windows (without naming or saving the file) and then copy it into Word, Word does build that up into a Microsoft Equation in the Professional format. I'm guessing that notepad.exe has no idea that the text was MathML. Does that mean that some background magic is happening to set the flavor or that Word is parsing the plain text clipboard contents to some degree?
  2. Is it likely that most processes/tool(s) where an individual copies an expression as MathML would handle setting the flavors correctly for MS Word?

The main risk I see is that the user doesn't want a MathML snippet converted into a math object. However, there are already many use cases where the user needs to use the "Paste as text" feature to avoid having it inserted as an object, formatted, etc.

Are there other risks/reasons not to?

GeorgeKerscher commented 3 months ago

I just tried using VS Code to write an expression and then copy it to Word.

The expression seems tocopy correctly.

In VS Code, I then created a more complicated expression using the backslash codes and copied it into a empty Word expression and it worked perfectly. It seems that using VS Code is much easier than working in the equation editor and simply pasting in what you want could be a good workflow.

Very interesting!

brichwin commented 3 months ago

I found an online clipboard-inspect tool that displays all of the different "flavors"/types of content available in the currently copied item. It's at: https://evercoder.github.io/clipboard-inspector/

GeorgeKerscher commented 3 months ago

This is very interesting! I was in VS Code and created an expression. I pasted it into word equation editor and all was well. I used ctrl+= to build it up and copied it to the clipboard.

I pasted it into the inspector and it gave me the plain text and the html option.

I tried putting it bac in linear with shift+ctrl+= and copied it again. I went to the inspector and copied it out.

From what I can tell, it round trips between Word and VS code with one exception. The \sqrt becomes √

So it seems that √ is aplain text character. I suppose that is just the unicode value.

The point is that VS code is simple to write in and if it round trips properly, then we may have a workflow that many people would want to use.

While doing this, I crashed MathCAT as I have done before. I filed that issue in the MathCAT issue tracker.

MurrayIII commented 2 months ago

To recognize plain text as MathML, Word requires the xmlns='http://www.w3.org/1998/Math/MathML' attribute on the < math> element

MurrayIII commented 2 months ago

To copy MathML from the SVG MathJax DOM (in https://murrayiii.github.io/UnicodeMathML/playground/), I wrote a JavaScript function:

function getMathJaxMathMlNode() {    / MathJax output-element DOM has the form:       <mjx-container          <svg             <mjx-assistive-mml                <mjx-container                   <svg                      <mjx-assistive-mml                         <math ...    /    let node = output.firstElementChild.lastElementChild.firstElementChild    console.log('nodename = ' + node.nodeName)    if (node.nodeName == 'MJX-CONTAINER')       node = node.lastElementChild.firstElementChild    return node }

Then in my output.addEventListener('keydown', function (e) {...}, I included

if (output.firstElementChild.nodeName == 'MJX-CONTAINER') {
    // MathJax is active. Copying the whole math zone is supported
    e.preventDefault()
    if (key.length > 1)
        return
    if (e.ctrlKey && key == 'c') {
        let node = getMathJaxMathMlNode()
        let mathml = node.outerHTML
        if (mathml.startsWith('<math'))
            navigator.clipboard.writeText(mathml)
    }
    return
}

This works with desktop Word