dart-lang / markdown

A Dart markdown library
https://pub.dev/packages/markdown
BSD 3-Clause "New" or "Revised" License
441 stars 201 forks source link

InlineSyntax with positive lookbehind fails #562

Closed Kal-Elx closed 8 months ago

Kal-Elx commented 8 months ago

Hello,

While trying to add an inline syntax for Latex I encountered a bug in this otherwise great package, I hope you can help me out.

To detect a Latex equation I'm using the regex \\\(.+?\\\)|(?<=([\s])|(\A))\$[^$]+?\$. This regular expression will match any string that starts with \( and ends with \), or any string that starts and ends with a dollar sign $, but only if the first dollar sign is either at the start of the string (\A) or follows a whitespace character (\s). I have verified that this regex is correct using regex101.com.

The package fails to detect equations that starts with $ if they are at the beginning of a string or if they are preceded by two line breaks.

I wrote a minimal example:

import 'package:markdown/markdown.dart';

String markdownToHtmlWithLatex(String markdown) => markdownToHtml(
      markdown,
      inlineSyntaxes: [
        LatexSyntax(),
      ],
    );

const latexPattern = r'\\\(.+?\\\)|(?<=(\s)|(\A))\$[^$]+?\$';

class LatexSyntax extends InlineSyntax {
  LatexSyntax() : super(latexPattern);

  @override
  bool onMatch(InlineParser parser, Match match) {
    parser.addNode(Element.text('latex', 'EQUATION'));
    return true;
  }
}
import 'package:markdown_bug/markdown_bug.dart';
import 'package:test/test.dart';

void main() {
  test('only equation', () {
    // fails
    expect(markdownToHtmlWithLatex(r'$equation$'),
        '<p><latex>EQUATION</latex></p>\n');
  });

  test('whitespace before equation', () {
    // succeeds
    expect(markdownToHtmlWithLatex(r' $equation$'),
        '<p><latex>EQUATION</latex></p>\n');
  });

  test('line break before equation', () {
    // succeeds
    expect(markdownToHtmlWithLatex(r'''
Equation:
$equation$
'''), '<p>Equation:\n<latex>EQUATION</latex></p>\n');
  });

  test('double line break before equation', () {
    // fails
    expect(markdownToHtmlWithLatex(r'''
Equation:

$equation$
'''), '<p>Equation:</p>\n<p><latex>EQUATION</latex></p>\n');
  });
}

Let me know if you have any ideas on how to fix this or if there are any workarounds I can do on my side. Thanks!

Kal-Elx commented 8 months ago

I realized that this problem is also visible when using the RegExp class in dart which I assume is causing this problem.

String findEquations(String data) {
  final latexRegex = RegExp(latexPattern);
  return data.replaceAllMapped(latexRegex, (match) {
    return 'EQUATION';
  });
}

Feel free to close this issue if that is the case.

lrhn commented 8 months ago

That is the case. The \A tag is not in JavaScript RegExp syntax, which is what Dart uses.

The corresponding Dart RegExp would be:

var latexRE = RegExp(r"\\\(.+?\\\)|(?<=\s|^)\$[^$]+?\$");

The ^ matches the start of the input string.

If you pass multiLine: true to the RegExp constructor, it instead matches the start of a line. For this RegExp, it'll work either way since \s matches line-terminators, so only at the start of the text is the start of a line not after a whitespace character.

(Should the RegExp match $$....$$ as well as $....$? If so, try: RegExp(r"\\\(.+?\\\)|(?<=\s|^)(\${1,2})[^$]+?\1").)