benrbray / prosemirror-math

Schema and plugins for "first-class" math support in ProseMirror!
https://benrbray.com/prosemirror-math/
MIT License
253 stars 35 forks source link

Paste from External Sources (MathJax in the wild, Wikipedia, etc.) #17

Open benrbray opened 3 years ago

benrbray commented 3 years ago

It would be great to modify the default paste behavior to automatically detect math markup in HTML pasted from external sources. Unfortunately, the solution will be messy, as there is not yet a universally-accepted way to render math on the web.

See Robert Miner 2010, "MathType, Math Markup, and the Goal of Cut and Paste" for a brief summary of the challenges faced in this area. Here's an except from one of the slides:

Math on the Web formats in the Wild

  • Image with TeX code (alt tags, comments, urls)
  • Some content is in text (HTML math, TeX source, ASCII art)
  • Some is in the DOM (MathML, s and CSS)

The following tasks are relatively low-effort and high-reward:

Some higher-effort tasks:

Things to be cautious of:

Here are some places we might expect users to paste from:

benrbray commented 3 years ago

I started to implement pasting of math from Wikipedia using a custom ProseMirror ParseRule (and the .getContent property), but ran into some unexpected behavior where the pasted math nodes all come up empty. I started a question on the ProseMirror forum which will hopefully resolve the issue.

benrbray commented 3 years ago

This website has math rendered using Madoko, which renders math and diagram SVGs server-side and includes them the following format:

<svg class="snippet math-display math-render-svg math" data-math-full="true" style="..." viewBox="...">
    <desc>\begin{tikzpicture}
    \matrix[nodes={draw}, row sep=0.3cm,column sep=0.5cm] {
      \node [rectangle, draw=none] (eq) {$a = b, b = c, d = e, b = s, d = t: $};&
      \node [circle, draw] (abcs) {$a, b, c, s$}; &
      \node [circle, draw] (det) {$d, e, t$}; \\
    };
    \end{tikzpicture}
    </desc>
    <g id="math-a6e187">...</g>
</svg>

This example contains an SVG rendering of a tikz diagram, which is obviously problematic for KaTeX, which is the current default. Once MathJax is supported, an extension like TikzJax can be used to render diagrams.

UPDATE: It won't be possible to paste from documents rendered with Madoko. The TeX source is contained in a <desc> tag within an SVG element, and apparently the <desc> tags are stripped away in both Chrome and Firefox when copying.

benrbray commented 3 years ago

UPDATE: StackExchange keeps its TeX code in <script type="math/tex"> tags, but these are stripped away when copying for security reasons. To copy from StackExchange, we'll need to parse the MathML directly.