commonmark / commonmark-java

Java library for parsing and rendering CommonMark (Markdown)
BSD 2-Clause "Simplified" License
2.27k stars 288 forks source link

Preserving empty line nodes #264

Open ijioio opened 2 years ago

ijioio commented 2 years ago

I am using commonmark library to generate emails. It is a really nice and easy to use library. It have both HtmlRenderer for rich email body, and TextContentRenderer for plain text email body. Looks like perfect solution. The person who sends an emails, prepares it using markdown syntax, then content is processed by both parsers and embedded to the email as plain and html body. But the problem is that parser discards line breaks (or empty lines).

Example:

Parser parser = Parser.builder().build();

Node document = parser.parse("foo\n\nbar\n\nbaz");

HtmlRenderer htmlRenderer = HtmlRenderer.builder().build();
TextContentRenderer textRenderer = TextContentRenderer.builder().build();

htmlRenderer.render(document);
textRenderer.render(document);

will result in for html

<p>foo</p>
<p>bar</p>
<p>baz</p>

for plain text:

foo
bar
baz

Parser just discards the empty lines completely. While for html this is acceptable cause each line will be wrapped with paragraph, for plain text renderer it is critical as the text looses its formatting.

What I expect is that parser will handle empty lines as a separate nodes, but renderers can process them differently, html renderer can just skip it, while plain text renderer will append them to the ouptut:

foo

bar

baz

I was not dig too deep in the sources, so not sure if there is some possible solution for that at the current state...

robinst commented 1 year ago

Hi @ijioio. As far as I understand, you're just unhappy with what the TextContentRenderer does with paragraphs, right? If it rendered it as follows:

foo

bar

baz

You would be happy?

I'm honestly not sure why it renders it this way at the moment:

foo
bar
baz

@JinneeJ do you remember what the reason was for this? We might have to make this configurable for backwards compat reasons, but it would still be good to know why.

robinst commented 1 year ago

@ijioio Also note, you can already change this behavior with existing API via TextContentRenderer.builder().nodeRendererFactory(myFactory) and overriding the rendering of Paragraph nodes.

JinneeJ commented 1 year ago

@JinneeJ do you remember what the reason was for this? We might have to make this configurable for backwards compat reasons, but it would still be good to know why.

@robinst I don't remember exactly 😅 I guess it didn't occur to me that we might need adding extra empty lines for paragraphs. But anyway having backward compatibility would be great.

ijioio commented 1 year ago

Hi @ijioio. As far as I understand, you're just unhappy with what the TextContentRenderer does with paragraphs, right? If it rendered it as follows:

foo

bar

baz

You would be happy?

Yes, exactly! I just want plain emails still keep some logical structure of the original text.

ijioio commented 1 year ago

@ijioio Also note, you can already change this behavior with existing API via TextContentRenderer.builder().nodeRendererFactory(myFactory) and overriding the rendering of Paragraph nodes.

Thank you for the suggestion. Not sure is it a universal solution though. I mean, that I will never know in advance in which commonmark syntax block these new line breaks will be used. Is that implies I need to override all possible cases?

mattrob33 commented 1 year ago

+1 for the original feature request of actually preserving empty lines. As a specific use case, I'm using commonmark to build a sort of rich text editor using markdown as the data, which requires maintaining a mapping between the literal and rendered text. So, e.g., in

An _italic_ text.

when the user places the cursor at index 5 in the rendered text, this actually maps to index 6 in the markup text, to account for the first underscore.

Unfortunately this breaks with newlines since they are simply discarded, so I have no way of knowing how many newlines the user entered.

robinst commented 3 days ago

@mattrob33 I think that's a different use case than the original reporter, which is for the TextContentRenderer. Could you create a new issue with your use case please? It sounds like you want the empty lines to be represented in the AST (Node) somehow? You might be able to do something with source spans, see includeSourceSpans on Parser.Builder?