SO-Close-Vote-Reviewers / UserScripts

Various user scripts that add features to the review queue or to the chat room
Other
57 stars 40 forks source link

Magic Editor doesn't support formula blocks #182

Open MdoubleDash opened 4 years ago

MdoubleDash commented 4 years ago

As noted here, some of the communities are using MathJax to render formulas. Anything between single or double dollar signs will be considered a formula ($...$ or $$...$$).

There would be concerns about false positives, like the ones @makyen mentioned here https://github.com/SO-Close-Vote-Reviewers/UserScripts/issues/181:

(e.g. someone providing a list of prices in $, would look like multiple blocks of $...$)

Short answer to this specific concern would be that in those communities, one needs to escape the dollar sign (\$) if they actually want to have it as is and not as indication of start/end of formulas.

Refer to to Match everything Between two Characters except when there is a Blank line for a detailed explanation of what should be considered a formula-block.

In the SO post above, The fourth bird's answer provides a regex that would match formula blocks and would not return false positives.

(?<!\S)(\$\$?+)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*(?:\r?\n(?![^\S\r\n]*$)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*)*\1(?!\S)

Regex101 demo

Wiktor Stribiżew's comment -- DEMO:

/^[^\S\r\n]*(\${1,2})(?:(?!\1|^$)[\s\S])+?\1[^\S\r\n]*$/gm

There is also a matter of implementing this only for communities which support MatchJax.

makyen commented 4 years ago

Thanks for creating an issue.

Just an FYI wrt. your SO question: If you want a regex that works in JavaScript (i.e. in MagicEditor), you need to specify JavaScript in your SO question. There are many flavors of regular expressions, which all support different sets of features. Specific to the question you linked above, JavaScript supports neither lookbehind, nor possessive quantifiers, so the regex in the answer isn't viable for MagicEditor as written.

However, having definite examples of what should be detected and should be excluded, like what's included in the regex101 link you provided, helps quite a bit. Thanks.

It helps quite a bit to know that $ must be escaped on sites which support MathJax. It's relatively easy to determine if a site supports MathJax. So, using the combination of those two makes it easier to avoid false positives.

As to generating a regex to identify MathJax, I appreciate the effort, but we'll probably generate our own, or look at what MathJax uses to identify the blocks it's going to act upon.

MdoubleDash commented 4 years ago

I just wanted to add that matching code-blocks should have priority over formulas. Imagine a meta post that is trying to explain how to write formulas. They would either use grave accents or 4+ leading spaces. Just another special case to have in mind.