gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 271 forks source link

text-to-kramdown-to-HTML discrepancy for dd,li with initial blank line #486

Open ccorn opened 6 years ago

ccorn commented 6 years ago

Consider a file test.md with the following contents:

item
: 

    para

Caution: This has (and requires) a (horizontal) blank after the colon. Let's try kramdown on it:

$ kramdown test.md
<dl>
  <dt>item</dt>
  <dd>

    <p>para</p>
  </dd>
</dl>

As expected (by me, at least). But the kramdown converter output is:

$ kramdown -o kramdown test.md
item
: para

which does not mark para as (explicit) paragraph. Therefore, the corresponding HTML output now misses <p> tags:

$ kramdown -o kramdown test.md | kramdown
<dl>
  <dt>item</dt>
  <dd>para</dd>
</dl>

As a consequence, test cases like test.md trigger text-to-kramdown-to-html failures. Update: Also occurs with other list types, see below.

ccorn commented 6 years ago

The roughly equivalent input

item
: ^
  para

does not have that problem.

ccorn commented 6 years ago

Using irb, I find that test.md gets parsed to

=> <kd:root nil {:encoding=>#<Encoding:UTF-8>, :location=>1, :options=>{}, :abbrev_defs=>{}, :abbrev_attr=>{}} [<kd:dl nil {:location=>1, :ial=>nil} [<kd:dt nil {:location=>2, :raw_text=>"item"} [<kd:text "item" nil {:location=>nil}>]>, <kd:dd nil {:location=>2} [<kd:blank "\n" nil>, <kd:p nil {:location=>3} [<kd:text "para" nil {:location=>3}>]>]>]>]>

(note the <kd:blank "\n" nil>,) whereas the alternative with : ^ gets parsed to

=> <kd:root nil {:encoding=>#<Encoding:UTF-8>, :location=>1, :options=>{}, :abbrev_defs=>{}, :abbrev_attr=>{}} [<kd:dl nil {:location=>1, :ial=>nil} [<kd:dt nil {:location=>2, :raw_text=>"item"} [<kd:text "item" nil {:location=>nil}>]>, <kd:dd nil {:location=>2} [<kd:p nil {:location=>3} [<kd:text "para" nil {:location=>3}>]>]>]>]>

without <kd:blank>.

So, for beauty we might want to patch the parser to remove initial <kd:blank> from <kd:dd> elements (and <kd:li>, see below). And for robustness we might want to patch the kramdown output converter such that it handles blanks followed by paragraphs correctly in :dd (and :li) elements. One of those would suffice, but both would be better.

ccorn commented 6 years ago

Hmm, the same issue occurs with other list types:

$ kramdown <<EOF
$ * 
$ 
$     para
$ EOF
<ul>
  <li>

    <p>para</p>
  </li>
</ul>
$ kramdown -o kramdown <<EOF 
$ * 
$ 
$     para
$ EOF
* para

$ kramdown -o kramdown <<EOF | kramdown
$ * 
$ 
$     para
$ EOF
<ul>
  <li>para</li>
</ul>

Note that there is a (horizontal) blank required after the asterisk. Same with 1. instead of *.

gettalong commented 6 years ago

Hmm... I would consider this a bug in the parser. According to the syntax documentation for (un)ordered lists, the first paragraph should only be a real paragraph if its text is followed by a blank line. Similar for definition lists although there the blank line must precede the definition text.

I have a fix that I need to test a bit more.

And there is probably also a bug in the kramdown converter when outputting list items of ul/ol or definitions of dl.