fletcher / peg-multimarkdown

An implementation of MultiMarkdown in C, using a PEG grammar - a fork of jgm's peg-markdown. No longer under active development - see MMD 5.
Other
525 stars 55 forks source link

Improperly includes closing tag when parsing text inside HTML tags #113

Closed onecrayon closed 11 years ago

onecrayon commented 12 years ago

To see this improper behavior in action, run the following in the Terminal (assuming that you have switched to a directory containing the multimarkdown binary):

echo "<div markdown=\"1\">
# Example

Paragraph wraps closing div tag:
</div>" | ./multimarkdown

The output I expect is this:

<div>
<h1 id="example">Example</h1>

<p>Paragraph wraps closing div tag:</p>
</div>

But the output that I get is this:

<div>

<h1 id="example">Example</h1>

<p>Paragraph wraps closing div tag:
</div></p>

The same problem exists for parsing lists that abut the end tag, except the closing tag gets wrapped by both the last LI and the UL. Adding an extra linebreak between the paragraph and the closing tag results in the paragraph/list being closed properly, but then the closing tag is wrapped in its own paragraph:

<p></div></p>

Ideally, MultiMarkdown should treat everything between the opening and closing tag of a tag with markdown="1" as a standalone document snippet and completely ignore the tags themselves.

fletcher commented 12 years ago

I'll have to dig into this when I get some time - supporting html within the MultiMarkdown PEG has turned out to be a bit more complex than I would have thought.

halostatue commented 12 years ago

I think that this is related to an issue that I'm seeing:

echo '<p markdown=1 class="foo">Foo</p>' | multimarkdown

I would expect to get

<p class="foo">Foo</p>

But instead I get:

<p><p class="foo">Foo</p></p>

Version:

multimarkdown -v
peg-multimarkdown version 3.5
portions Copyright (c) 2010-2012 Fletcher T. Penney.
portions Copyright (c) 2011 Daniel Jalkut, MIT licensed.
original Copyright (c) 2008-2009 John MacFarlane.  License GPLv2+ or MIT.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

[edited to fix an error in my result text]

c00kiemon5ter commented 12 years ago

yep, I also get that. example input

<div id="header" markdown=1>

some header
==========

<div id="list" markdown=1>

* a
* small
* list
<!-- with a comment -->

</div>
</div>

and the result is:

<div id="header" >

<h1 id="someheader">some header</h1>
<div id="list" >

<ul>
<li>a</li>
<li>small</li>
<li>list
<!-- with a comment --></li>
</ul>

<p></div>
</div></p>

notice that the comment in the list was included as an <li> element and the </div></div> was wrapped in <p> </p>

also if I don't leave an empty line between the raw html <div id=... and the markdown list then the list will be considered plain text.

fletcher commented 11 years ago

Looks like this is fixed in MMD-4.

victorliu commented 9 years ago

I'm encountering this same problem in MMD 4.6

fletcher commented 9 years ago

Can you send an example?

victorliu commented 9 years ago

Input:

<div id="content" markdown=1>
text
</div> <!-- content -->

Output:

<div id="content" >
<p>text</p>
<p></div> <!-- content --></p>
fletcher commented 9 years ago

I think you're using an old version. That's not the output I get.

victorliu commented 9 years ago

Command: cat foo.mmd

<div id="content" markdown=1>

text

</div> <!-- content -->

Command: multimarkdown foo.mmd

<div id="content" >

<p>text</p>

<p></div> <!-- content --></p>

Command: multimarkdown -v

MultiMarkdown version 4.6
Copyright (c) 2013-2014 Fletcher T. Penney.

LyX export code (c) 2013-2014 Charles R. Cowan,
licensed under both GPL and MIT licenses.

portions based on peg-markdown - Copyright (c) 2008-2009 John MacFarlane.
peg-markdown is Licensed under either the GPLv2+ or MIT.
portions Copyright (c) 2011 Daniel Jalkut, MIT licensed.

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
fletcher commented 9 years ago

What OS are you using?

victorliu commented 9 years ago

Mac OS X 10.10.1

fletcher commented 9 years ago

I rebuilt from source from 4.6 tag. Works fine.

Deleted my old version, used installer from website for 4.6. Works fine.

I suspect you have more than one version installed, and something strange is happening with different commands?? I can't make MMD 4.6 give that output for that input.

You can send me your file in case you've done something unusual to the file.

victorliu commented 9 years ago

I installed it using the installer; it should be the only version. The problem goes away if I move the HTML comment to a separate line.

fletcher commented 9 years ago

Like I said, I can test on your file if you send it. But the text as pasted above works just fine for me using 4.6.

fletcher commented 9 years ago

I experimented some more. If I alter the text you sent so that there are blank lines between the initial text, I can get similar output to what you describe. But again, what you sent above works.

The following is a different situation:

<div id="content" markdown=1>

*text*

</div> <!-- content -->

The first example is basically a single "paragraph" and the HTML is interpreted as a series of inline elements. This can occur within a paragraph, and is relatively easy.

The second is a series of "paragraphs" and MMD has to decide what belongs as HTML inside the tags and what doesn't. The way it does this is to look for matching open and ending tags. The opening tag has to be at the beginning of a line, and the closing tag has to be at the end of the line.

If you are going to use HTML that spans multiple "paragraphs", they need to be somewhat organized. It's pretty easy -- just put the opening and closing tags on lines by themselves. Otherwise I would need a more complex (and probably slower) HTML parser to handle it, which is very low on my list of priorities at the moment.

victorliu commented 9 years ago

Ah, I see. I was careless when I removed the blank lines in the first example. That solves it then. Thanks!