gjtorikian / commonmarker

Ruby wrapper for the comrak (CommonMark parser) Rust crate
MIT License
416 stars 80 forks source link

AST in 1.0 #199

Closed joeldrapper closed 1 year ago

joeldrapper commented 1 year ago

Hey, I ran into an issue with the parser putting paragraph tags inside list items.

This

1. Hello
2. World

would produce this

<ol>
  <li><p>Hello</p></li>
  <li><p>World</p></li>
</ol>

I wondered if this might be fixed in the 1.0 pre-release but it looks like the option to parse to a document has been removed. Do you have plans to bring that back?

gjtorikian commented 1 year ago

I do not. I want to move to using html-pipeline for any AST manipulation. However, I’d consider a PR that added the feature back in.

Right now this gem uses https://github.com/kivikakk/comrak for Commonmark rendering, which supports walking the node tree. Also, if this rendering is unexpected, the bug should be filed over there. I am a bit surprised by it, though, as both this gem and that crate pass the Commonmark spec. I wonder if there is some unexpected whitespace in your text which is causing the p tag to appear?

joeldrapper commented 1 year ago

The p tag appears in the AST with the string "1. Hello\n2. World". It’s the same with the Markly fork too.

gjtorikian commented 1 year ago

Wow, interestingly, while the HTML does not render the p, the AST does: https://spec.commonmark.org/dingus/

  <list type="ordered" start="1" tight="true" delimiter="period">
    <item>
      <paragraph>
        <text>Hello</text>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text>World</text>
      </paragraph>
    </item>
  </list>

I'm afraid I don't have time to dig into this but it seems to me that there is an inconsistency in the core CommonMark spec?

joeldrapper commented 1 year ago

Wow, looks like the paragraph is expected to check the parent node type. 😮

https://github.com/commonmark/commonmark.js/blob/9a16ff4fb8111bc0f71e03206f5e3bdbf7c69e8d/lib/render/html.js#L111-L115

kivikakk commented 1 year ago

Yep. :/ This is to spec — see the definition of "loose" and "tight".

Also, the original example at the top of this issue is off in more than one way; that's an ordered list (1. Hello …), but the output HTML has <ul>.

joeldrapper commented 1 year ago

Also, the original example at the top of this issue is off in more than one way; that's an ordered list (1. Hello …), but the output HTML has <ul>.

That was my typo, sorry!

I find it strange that the AST produces a paragraph node when the list is tight, but I can handle this in my renderer.

kivikakk commented 1 year ago

Yepyep — I think it's to keep the CommonMark AST regular; rather than list items containing blocks or inlines, they only contain blocks, and then under some circumstances renderers can omit. HTML is the weirdo here, really.

(I ran into a similar issue lately implementing Tiptap in a work project, where you can have your document schema have list items contain inlines or blocks, but not easily both. So I ended up having them always be blocks, and then adjusted the margin on li > p. It's a strange and interesting little corner.)

joeldrapper commented 1 year ago

Thanks everyone. @gjtorikian I’m going to have to use Markly for now, but I’ll keep an eye on developments here. comrak looks great, but I need the AST. html-pipeline looks really interesting too! I’ll be sure to check that out.

ioquatix commented 1 year ago

Can you submit a bug report to https://github.com/ioquatix/markly :) We will fix it.

joeldrapper commented 1 year ago

@ioquatix Turns out it was by design and there is a way to handle it based on the loose predicate.

bradgessler commented 1 year ago

FWIW I'm working on a project where I'd need the AST. I too am going to try using @ioquatix's Markly library.

gjtorikian commented 5 months ago

👋 My how the turn tables...

I also needed to be able to manipulate an AST, so I reintroduced this feature: https://github.com/gjtorikian/commonmarker/pull/276

Let me know if there are any operations you want to perform on a tree that isn't in that PR and I'll try to add it before releasing it. Otherwise I can do it whenever my open source schedule allows.

joeldrapper commented 5 months ago

Oh that's great to hear @gjtorikian! I’ll take a look. The most useful thing for me would be to have a double-dispatch visitor pattern like in Prism/SyntaxTree. That’s definitely something that could be implemented on the Ruby side and I’d be happy to work on that in a follow-up PR.