commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.65k stars 547 forks source link

Consider allowing Nodes with a "fragment" type. #558

Open ioquatix opened 2 months ago

ioquatix commented 2 months ago

I'm interested in manipulating markdown ASTs.

It's sometimes problematic to extract a fragment, e.g.

header = Markly.parse("### Hello `World`").first_child
fragment = header.extract_children # Move all children to a node of type `custom`
fragment.to_markdown # => "Hello `World`\n\n"

I'm having trouble with the output formatting.

Ideally, the fragment would just be the raw children, e.g.

fragment.to_markdown # => "Hello `World`"

This would make it a little easier to mash up the AST with different types of outputs.

I suppose the issue is the formatting of custom type nodes, but I'm not sure if there is a problem to change this.

In other words, either (1) we could change the output to be "inline" or add an inline "fragment" node type which does minimal formatting. Thoughts?

ioquatix commented 2 months ago

Okay, I just realised there is an custom_inline type... I'm trying it now...

ioquatix commented 2 months ago

So, even with custom_inline, I'm getting newline characters in the output:

fragment.to_markdown
=> "Hello `World`\n"

Not sure why.

ioquatix commented 2 months ago

Okay, I found the problematic code:

  // ensure final newline
  if (renderer.buffer->size == 0 || renderer.buffer->ptr[renderer.buffer->size - 1] != '\n') {
    cmark_strbuf_putc(renderer.buffer, '\n');
  }

I believe we should only invoke this if the top level node is a block.

ioquatix commented 2 months ago

So there are probably two issues here:

jgm commented 2 months ago

The renderer is designed to ensure that the output produced by cmark ends with a newline. Unix text files are supposed to end with a newline.

The custom_block renderer adds a newline after the custom block because new blocks in commonmark must always start on a new line.

Isn't it easy enough to strip off the newline in your application?

ioquatix commented 2 months ago

As you said yourself, in some cases the newline is semantic, so I'm not sure stripping it off arbitrarily is a good idea.

The goal of a fragment is to be able to inject it correctly into existing markdown, e.g. moving a title text into a list item:

node = Markly.parse("# My Title")
# Extract a fragment:
children = node.extract_children

puts "- #{children.to_markdown}"

Introducing an inline fragment type would probably be a straight forward and compatible way of making this possible, and has the correct semantics for "not a complete document". In other words, it would make sense for the document type to map cleanly to "unix text file", but I don't think any of the inline_ node types should generate a newline - however changing that might be difficult (compatibility) so introducing a new fragment type is probably the easiest option.

jgm commented 2 months ago

Yes, that's true, you might e.g. have a line break element which would render as "\\n", and in this case you certainly wouldn't want to strip the trailing newline.

I'm pretty sympathetic to your request, but I'll need to dive into the details when I get time.

ioquatix commented 2 months ago

If you tell me your preferred approach I can create a PR.

jgm commented 2 months ago

I'm not sure about the details without looking at the code. But the rough idea I'd have prior to that would be:

If you want to fool around with it I'd definitely consider a PR (though no guarantees at this point that I'd merge it).

ioquatix commented 2 months ago

@jgm what do you think of https://github.com/commonmark/cmark/pull/559