Closed rogino closed 2 years ago
Unfortunately, regexes won't cut it to fix this problem in full generality, e.g. to support
$$ x = \text{what is $x$?} $$
Would you be willing to review a PR that rewrites the parsing engine to count {
/}
s and check whether $
s are escaped with \
, when processing in dollars
mode?
hmm ... interesting.
Usually I am very reluctant with extending the regexes used in texmath. They are performance critical, as they are evaluated once with each user key stroke.
In fact escaped dollar sign \$
is a valid element inside latex math and interestingly it is working flawlessly inside of other non-dollar delimiters.
Dollar delimiters are special beasts. While having a short look into the math-inline
regex
/\$((?:\S)|(?:\S.*?\S))\$/
I reidentified the final \S
(last character before closing dollar) as a shortcut for [^\r\n\t\f\v ]
(character exept), which might be easily extended to [^\r\n\t\f\v \\]
presumably without too much performance cost.
After testing it in https://regex101.com/ successfully for several relevant cases, I considered it worth for also use it as a guard in the math-block
regexes.
Surprisingly a drastic simplification is helping here ... from
/\${2}([^$]+?)\${2}/
to
/\${2}(.+?)\${2}/
Expect it available in the next version.
thanks ...
Erik, thanks for your potential help.
If there are some problems with my bug fix attempt, I would like to come back to your PR offer.
/\${2}(.+?)\${2}/
Changing from [^$]
to .
excludes newlines. I think you want to allow single newlines (though you could forbid double newlines, as TeX does) within math. Instead of .
you probably want [^]
, as in:
/\${2}([^]+?)\${2}/
But none of this fixes use of \$
(escaped $
) or \text{$...$}
(nested $
) within a math expression.
One potential fix would be to use the regex as above, and then check whether the closing $
is actually escaped (had an odd number of \
s before it), or has more unescaped {
s than }
s in it, and in that case, doing more work (probably another regex match to get contents until the next \${2}
, concatenating, and trying again). This would mostly just take extra time in the weird edge cases of escaped and nested $
s, which currently don't work, so seems like a win? But it would involve counting the number of {
s and }
s (without preceding \
s) to make sure it's matched. I imagine this is all way faster than the cost of calling KaTeX, though.
Alternatively (and what I originally had in mind), the regex could match the opening \${2}
, and then do a secondary search for {
or }
or `\${2}$, checking for escaping in each case, and repeat until finding the unnested closing notation. I could test to see which is faster in which cases.
In fact I had temporarily forgotten that '.
' excludes newlines. So taking your proposed '[^]
' works fine ...
inline: [
{ name: 'math_inline_double',
rex: /\${2}(.+?)\${2}/gy
},
{ name: 'math_inline',
rex: /\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$/gy
}
],
block: [
{ name: 'math_block_eqno',
rex: /\${2}([^]+?[^\\])\${2}\s*?\(([^)\s]+?)\)/gmy
},
{ name: 'math_block',
rex: /\${2}([^]+?[^\\])\${2}/gmy
}
]
... as you can see with this example code (https://github.com/goessner/markdown-it-texmath/blob/master/test/bug-dollars.html)
const str = `
# Simple Dollar tests
## Inline
here "$a+\\$ = b$" we "$\\$$" go "$\\text{\\$some...\\$}$"
## Inline block (single line only)
$$a+\\$ = \\text{\\$more...\\$} \\$$$ or ...
## Block (multiline)
$$
a+\\$ = \\text{\\$text...\\$} \\$
$$
`
resulting in ...
To also handle unescaped dollars inside of \text{$...$}
I see effort to use relation as disproportionate. It seems to be reasonable to also escape dollars \text{\$...\$}
in that edge case.
So I would prefer to live with that small insufficiency of markdown-it-texmath for performance and simplicity reasons.
I hope I have not overlooked anything ... thanks.
Minor points:
math_inline_double
still uses .
; should probably switch that to [^]
too.[^\s\\]
instead of [^\r\n\t\f\v \\]
. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)[^]+?[^\\]
(which occurs in both block rules) seems to require at least two characters. Should probably be [^]*?[^\\]
.math_inline_double
have the same addition to exclude a trailing \
?\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$
can be simplified to \$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$
or \$([^\s\\]|\S.*?[^\s\\])\$
. (I'm not quite sure why you're forbidding spaces next to the $
s but I assume that's intentional, to avoid some stray matching.)These new rules seem to deal with \$
properly. Nice!!
Nested $
s are for re-entering math mode. \text{$x+y$}
is different from \text{\$x+y\$}
:
So it's not possible to escape these instances of $
, as \$
means something in LaTeX.
Unmatched braces generate errors in KaTeX, though. (I see either Uncaught ParseError: KaTeX parse error: Expected '}', got 'EOF' at end of input
or KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: \text{
.) So perhaps that could be detected, which triggers an "extension" regex? I believe the extension regex is exactly math_inline
without the leading $
, or applying math_inline
but starting from the final $
. On input $\text{$x+y$}$
, after matching $\text{$
, you'd next grab x+y$
, fail, and then grab }$
, and then succeed. This is quadratic time in the number of $
s, but that's probably small... Alternatively, when in extension mode, we could count {
/}
s, so call KaTeX at most twice.
Sorry for the delay. Thanks for your valuable input.
* `math_inline_double` still uses `.`; should probably switch that to `[^]` too.
math_inline should be written on a single line, but I added it to be more forgiving here.
* I think you can use `[^\s\\]` instead of `[^\r\n\t\f\v \\]`. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)
This is definitely better ... taken.
* `[^]+?[^\\]` (which occurs in both block rules) seems to require at least two characters. Should probably be `[^]*?[^\\]`.
Yes ... thanks for catching.
* Shouldn't `math_inline_double` have the same addition to exclude a trailing `\`?
sure ... done.
* `\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$` can be simplified to `\$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$` or `\$([^\s\\]|\S.*?[^\s\\])\$`. (I'm not quite sure why you're forbidding spaces next to the `$`s but I assume that's intentional, to avoid some stray matching.)
yes ... again a significant improvement.
Nested
$
s are for re-entering math mode.\text{$x+y$}
is different from\text{\$x+y\$}
: So it's not possible to escape these instances of$
, as\$
means something in LaTeX.
Now I understand that reentering math mode effect. Escaping makes no sense indeed. I still consider it a not so relevant edge case in practise. I don't want to invest that significant implementation effort at current. A PR is always welcome though.
thanks again ...
The regex for dollar delimiters breaks if the content includes an escaped dollar (
\$
), making it impossible to type dollar literals inside math blocks:will all fail to be parsed as a latex block.
Workarounds include using a different delimiter style or use
\text{\textdollar}
, but getting it to work without these would be nice.I've been to update the
math_block
regex from:/\${2}([^$]+?)\${2}/gmy
to/\${2}((?:\\\$|[^$])+?)\${2}/gmy
but it doesn't cover all cases, and I haven't been able to figure out where to start with the
math_inline
regex.