commonmark / commonmark-java

Java library for parsing and rendering CommonMark (Markdown)
BSD 2-Clause "Simplified" License
2.31k stars 288 forks source link

ThematicBreak literal is lost in the Markdown renderer #331

Closed jumale closed 2 months ago

jumale commented 4 months ago

Corresponding to the specification, thematic breaks can consist of 3 or more consecutive on of characters -, _ or * with 0-3 leading spaces (i.e. regex ^\s{0,3}[-_*]{3,}$). This logic seem to be correct when reading Markdown - the thematic breaks are correctly recognised and captured as ThematicBreak node with literal parameter containing the actual value from the Markdown. However, during rendering the literal is dropped and replaced with static ___

Expected behaviour: this markdown looks the same after parse/render

foo

   *******

bar

---

baz

Actual behaviour: rendered transforms it into

foo

___

bar

___

baz
robinst commented 4 months ago

Yeah. MarkdownRenderer does not yet preserve everything from the input, it's main focus is on producing an equivalent document (___ is also a thematic break).

Would you want to try to make that change yourself and raise a PR? Looks like you've already found the right place where the fix should go :). I think in this case, if the node has a literal, we can use it, otherwise use ___ to not be ambiguous with lists.