evilstreak / markdown-js

A Markdown parser for javascript
7.69k stars 863 forks source link

Simple strong/em case fails to be parsed (***foo bar***) #63

Open TheCloudlessSky opened 12 years ago

TheCloudlessSky commented 12 years ago

The the sample case of ***foo bar*** fails to parse correctly.

It should be parsed as ["em", ["strong", "foo bar"]] but is instead parsed as [["strong", "*foo bar"], "*"].

However if spaces are added (* **foo** *), it produces the correct HTML.

lorddev commented 12 years ago

I think if you want em + strong you need to use underscores. From what I understand, 3 asterisks is supposed to be used in order to produce actual asterisks, e.g. _foo bar_

TheCloudlessSky commented 12 years ago

@lorddev Having ***foo bar*** produce <em><strong>foo bar</strong></em> is consistent with at least StackOverflow's and GitHub's markdown. To produce asterisks around bolded text would be with escaped asterisks **\*foo bar\***, which works fine and seems most intuitive.

lorddev commented 12 years ago

Ok. I must having been thinking of Google+, which implements only the asterisks and underscores subset of markdown.

ashb commented 12 years ago

This is quite probably a parsing bug as most other parsers treat it as <strong><em>foo bar</em></strong> as you can see here: http://babelmark.bobtfish.net/?markdown=***foo+bar***

(I know I made some decisions on purpose of what to parse and what to just ignore but I don't think this was one of them)

TheCloudlessSky commented 12 years ago

@ashb That's what I think too. I apologize for not submitting a pull request; I'm still learning the code base and how the parsing works. I did, however, add the test case in the inline_strong_em of regressions.t.js and it failed.

ashb commented 12 years ago

The parsing of strong and em is a little bit ... interesting (along with most of the rest of the parsing) and has some fruity backtracking like stuff in it. The strong_em helper function is what deals parsing of ** https://github.com/evilstreak/markdown-js/blob/50f6d695140b446721bcabab69397f2ffb6c92eb/lib/markdown.js#L977

ashb commented 12 years ago

Hmmm I've taken a look and the way the strong/em state is currently split out is what's causing the problem I suspect.

The problem is that it doesn't keep the ordering of which of a strong/em was last opened, so it closes the wrong one (the strong) as this is first in the regex pattern and doesn't know that it should check if it should close an em instead of a strong.

I suspect we'll have to rewrite that parser helper func to use a single state variable instead of two split ones.