commonmark / commonmark-java

Java library for parsing and rendering CommonMark (Markdown)
BSD 2-Clause "Simplified" License
2.31k stars 288 forks source link

Wrong SourceSpan with lazy continuation lines #337

Closed sylvainheraud closed 2 months ago

sylvainheraud commented 2 months ago

Hi ! I'm using commonmark-java to highlight delimiters in a RichTextFX editor, and using SourceSpan to compute positions. I have an issue with 2 examples given by commonmark specification.

Example 292

> 1. > Blockquote
continued here.

Actual behavior (printing SourceSpan in visitChildren)

I don't understand sourceSpans or the second line. It seems to me that BlockQuote/ListItem/OrderedList should start at column 0 for line 1.

Example 250

> > > foo
bar

I don't understand sourceSpan for the 3 BlockQuotes, some of them have negative length. It seems to me that all BlockQuotes should start at column 0 for line 1.

And If I add a character after bar

> > > foo
bars

In this case, I lose the deepest BlockQuote sourceSpan for line 1

Thank you for your help.

robinst commented 2 months ago

Yeah, this is a bug with lazy continuation lines, thanks for reporting. I'm planning to look into it before the next release, but if you want to help speed things up you could write a few test cases here:

https://github.com/commonmark/commonmark-java/blob/6bebfe21083d100e7d93f6d9dd639c1086762eee/commonmark/src/test/java/org/commonmark/test/SourceSpansTest.java#L14

sylvainheraud commented 2 months ago

Hi Robin, thank you for your answer. Please found the test cases.

            // From https://spec.commonmark.org/0.31.2/#example-250
            // Wrong source span for the inner block quote for the second line.
            Node document = PARSER.parse("> > > foo\nbar\n");

            BlockQuote blockQuote1 = (BlockQuote) document.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 0, 9), SourceSpan.of(1, 0, 3)), blockQuote1.getSourceSpans());
            BlockQuote blockQuote2 = (BlockQuote) blockQuote1.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 2, 7), SourceSpan.of(1, 0, 3)), blockQuote2.getSourceSpans());
            BlockQuote blockQuote3 = (BlockQuote) blockQuote2.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 4, 5), SourceSpan.of(1, 0, 3)), blockQuote3.getSourceSpans());
            Paragraph paragraph = (Paragraph) blockQuote3.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 6, 3), SourceSpan.of(1, 0, 3)), paragraph.getSourceSpans());
            // Adding one character to the last line remove blockQuote3 source for the second line
            Node document = PARSER.parse("> > > foo\nbars\n");

            BlockQuote blockQuote1 = (BlockQuote) document.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 0, 9), SourceSpan.of(1, 0, 4)), blockQuote1.getSourceSpans());
            BlockQuote blockQuote2 = (BlockQuote) blockQuote1.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 2, 7), SourceSpan.of(1, 0, 4)), blockQuote2.getSourceSpans());
            BlockQuote blockQuote3 = (BlockQuote) blockQuote2.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 4, 5), SourceSpan.of(1, 0, 4)), blockQuote3.getSourceSpans());
            Paragraph paragraph = (Paragraph) blockQuote3.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 6, 3), SourceSpan.of(1, 0, 4)), paragraph.getSourceSpans());
            // From https://spec.commonmark.org/0.31.2/#example-292
            Node document = PARSER.parse("> 1. > Blockquote\ncontinued here.");

            BlockQuote blockQuote1 = (BlockQuote) document.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 0, 17), SourceSpan.of(1, 0, 15)), blockQuote1.getSourceSpans());
            OrderedList orderedList = (OrderedList) blockQuote1.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 2, 15), SourceSpan.of(1, 0, 15)), orderedList.getSourceSpans());
            ListItem listItem = (ListItem) orderedList.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 2, 15), SourceSpan.of(1, 0, 15)), listItem.getSourceSpans());
            BlockQuote blockQuote2 = (BlockQuote) listItem.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 5, 12), SourceSpan.of(1, 0, 15)), blockQuote2.getSourceSpans());
            Paragraph paragraph = (Paragraph) blockQuote2.getLastChild();
            assertEquals(List.of(SourceSpan.of(0, 7, 10), SourceSpan.of(1, 0, 15)), paragraph.getSourceSpans());
robinst commented 2 months ago

Thanks for these! I've raised a fix for it here: https://github.com/commonmark/commonmark-java/pull/339

sylvainheraud commented 2 months ago

Thank you, @robinst !

robinst commented 2 months ago

This has been released in 0.23.0, see full changelog here: https://github.com/commonmark/commonmark-java/blob/main/CHANGELOG.md#0230---2024-09-16