bhollis / maruku

A pure-Ruby Markdown-superset interpreter (Official Repo).
MIT License
502 stars 80 forks source link

maruku 0.7 Chokes on the HTML 5 ruby Tag #124

Closed ghost closed 10 years ago

ghost commented 10 years ago

I encountered this error when I just recently pushed a change to my GitHub user page. Possibly since they may have upgraded Jekyll since early December last year and thus moved to a new version of maruku. Currently I think this is a regression, since I have made no changes to my Markdown syntax since the last push to GitHub.

Example file ruby.md (Edit: I botched the syntax with = vs. #, but the example still works):

= Markdown, with some ruby =

What follows uses ruby
<ruby>
    <rb>東</rb><rp>(</rp><rt>トウ</rt><rp>)</rp>
    <rb>京</rb><rp>(</rp><rt>キョウ</rt><rp>)</rp>
</ruby>.

Now, let us hand it to maruku.

imo> maruku --version
Maruku 0.7.0
imo> maruku ruby.md
 ___________________________________________________________________________
| Maruku tells you:
+---------------------------------------------------------------------------
| Ignoring line '= Markdown, with some ruby =' type = header1
| At line 1
|    header1 --> |= Markdown, with some ruby =|
|      empty     ||
|       text     |What follows uses ruby|
|   raw_html     |<ruby>|
|
+---------------------------------------------------------------------------
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:104:in `parse_blocks'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:22:in `parse_text_as_markdown'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_doc.rb:35:in `parse_doc'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/maruku.rb:10:in `initialize'
\___________________________________________________________________________

 ___________________________________________________________________________
| Maruku tells you:
+---------------------------------------------------------------------------
| Malformed HTML starting at "<ruby>"
| ---------------------------------------------------------------------------
| <ruby>EOF
| |---------------------------------------------------------------------------
| +--- Byte 0
| Shown bytes [0 to 6] of 6:
| ><ruby>
| 
| At line 4
|    header1     |= Markdown, with some ruby =|
|      empty     ||
|       text     |What follows uses ruby|
|   raw_html --> |<ruby>|
|       code     |    <rb>東</rb><rp>(</rp><rt>トウ</rt><rp>)</rp>|
|       code     |    <rb>京</rb><rp>(</rp><rt>キョウ</rt><rp>)</rp>|
|   raw_html     |</ruby>.|
| 
| 
| Elements read in span: 
|  -
|
+---------------------------------------------------------------------------
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:423:in `read_inline_html'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:94:in `read_span'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:14:in `parse_span'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:284:in `read_paragraph'
\___________________________________________________________________________

 ___________________________________________________________________________
| Maruku tells you:
+---------------------------------------------------------------------------
| Maruku cannot parse this block of HTML/XML:
| |<ruby>
| #<REXML::ParseException: Missing end tag for 'ruby' (got "html")
| Line: 4
| Position: 165
| Last 80 unconsumed characters:
| >
| /usr/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:335:in `pull_event'
| /usr/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:183:in `pull'
| /usr/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:22:in `parse'
| /usr/lib/ruby/1.9.1/rexml/document.rb:243:in `build'
| /usr/lib/ruby/1.9.1/rexml/document.rb:43:in `initialize'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/html.rb:156:in `new'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/html.rb:156:in `initialize'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/html.rb:43:in `new'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/html.rb:43:in `new'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/helpers.rb:70:in `md_html'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:432:in `read_inline_html'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:94:in `read_span'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:14:in `parse_span'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:284:in `read_paragraph'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:155:in `read_text_material'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:50:in `parse_blocks'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_block.rb:22:in `parse_text_as_markdown'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_doc.rb:35:in `parse_doc'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/maruku.rb:10:in `initialize'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/bin/maruku:118:in `new'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/bin/maruku:118:in `block (2 levels) in <top (required)>'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/bin/maruku:16:in `benchmark'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/bin/maruku:118:in `block in <top (required)>'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/bin/maruku:99:in `each'
| /var/lib/gems/1.9.1/gems/maruku-0.7.0/bin/maruku:99:in `<top (required)>'
| /usr/local/bin/maruku:23:in `load'
| /usr/local/bin/maruku:23:in `<main>'
| ...
| Missing end tag for 'ruby' (got "html")
| Line: 4
| Position: 165
| Last 80 unconsumed characters:
| 
| Line: 4
| Position: 165
| Last 80 unconsumed characters:
+---------------------------------------------------------------------------
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/helpers.rb:73:in `rescue in md_html'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/helpers.rb:67:in `md_html'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:432:in `read_inline_html'
!/var/lib/gems/1.9.1/gems/maruku-0.7.0/lib/maruku/input/parse_span.rb:94:in `read_span'
\___________________________________________________________________________

The same file is accepted by for example kramdown.

imo> kramdown --version
1.2.0
imo> kramdown ruby.md
<p>= Markdown, with some ruby =</p>

<p>What follows uses ruby
<ruby>
    <rb>東</rb><rp>(</rp><rt>トウ</rt><rp>)</rp>
    <rb>京</rb><rp>(</rp><rt>キョウ</rt><rp>)</rp>
</ruby>.</p>
bhollis commented 10 years ago

This is not related to the <ruby> tag - it's a bug where HTML that doesn't have a newline before it causes a parsing error. This is a duplicate of #123 and is slated to be fixed for 0.7.1.

bhollis commented 10 years ago

Actually, running this example myself, I see Maruku complain a lot, but the right output appears (minus the weird header):

<p>What follows uses ruby <ruby>
    <rb>東</rb><rp>(</rp><rt>トウ</rt><rp>)</rp>
    <rb>京</rb><rp>(</rp><rt>キョウ</rt><rp>)</rp>
</ruby>.</p>

I'll try to fix the warning, but nothing bad is happening to your output.

ghost commented 10 years ago

@bhollis: Thank you for the quick response. You are correct, it does indeed seem to produce the correct output (my bad, I jumped the gun there). Weirdly enough it still prevents GitHub from building the page. Whether or not this is a genuine crash on their side or if Jekyll or their build script simply halts if there is something on stderr I don't know.