benrbray / prosemirror-math

Schema and plugins for "first-class" math support in ProseMirror!
https://benrbray.com/prosemirror-math/
MIT License
249 stars 35 forks source link

Plugin to detect math in pasted LaTeX-like text #18

Open benrbray opened 3 years ago

benrbray commented 3 years ago

Currently, consumers of prosemirror-math can set up custom paste behavior for their own configuration, but it would be helpful if we provided some tools to make it easier.

So, prosemirror-math should export an optional Plugin that detects dollar signs (or another user-configurable math delimiter) in pasted plain text and converts the encompassed text to an inline or block math node.

However, handling non-math dollar signs will be tricky, since it is unlikely that they will already be escaped in the pasted text. So, we will need to apply some common-sense criterion to determine whether a dollar sign corresponds to math or not. For example,

bohrium commented 3 years ago

Here is an interesting but probably too-rare-to-worry-about case:

        The interstate highway system cost roughly $10^13 (in _today_'s dollars) to build; $\exp(2\pi i)=1$.

Here, a greedy approach based on common-sense heuristics might classify 10^13 (in _today_'s dollars) to build; as latex; this will absorb the middle dollar sign and thus prevent \exp(2\pi i)=1 from being considered. This is the sort of interaction that dynamic programming is good for. But this might be too rare to worry about.

Some ideas for features with which to classify (weights set by intuition, not learned programmatically from data): a. latex-y characters: 8 * (number of backslashes) + 1 * (number of underscores, carats, or curlies) b. numeric content: 1 * (number of digit characters) + 3 * (number of plus signs, minus signs, and equals signs) c. bracket consistency: -13 * (1 if curly brace pattern is illegal (in the sense of catalan) else 0) d. spacing context: -5 * (number of non-white-space characters immediately outside the dollar signs) e. word count: -2 * (number of (space-delimited) blocks between the dollar signs)

In fact, simply threshholding a+e >= 1 would probably work well.