erusev / parsedown

Better Markdown Parser in PHP
https://parsedown.org
MIT License
14.77k stars 1.12k forks source link

URLs with underscores break emphasis when also emphasis also uses underscores #588

Closed jacroe closed 6 years ago

jacroe commented 6 years ago

The following markdown does not get rendered correctly: _[Markup language](https://en.wikipedia.org/wiki/Markup_language)_

Parsedown renders it as <p>_<a href="https://en.wikipedia.org/wiki/Markup_language">Markup language</a>_</p> instead of <p><em><a href="https://en.wikipedia.org/wiki/Markup_language">Markup language</a></em></p>.

The text gets rendered correctly when using asterisks as emphasis indicators instead.

aidantwoods commented 6 years ago

I think this is essentially a duplicate of #107. Also see #461.

Unfortunately this type of issue is a little difficult to solve properly without major changes to the way Parsedown currently works.

More detail This is because Parsedown currently parses the line left to right without stopping. In order to solve this it would be necessary to look ahead in the line for other inlines and determine which of those found is most important. Depending on how this is done it might be necessary to attempt to extend finished elements over ones that cover previously identified valid end markers (as is the case here).


i.e. on its own _[Markup language](https://en.wikipedia.org/wiki/Markup_ is a valid emphasis, and should be identified as one. It is only when the link is also identified that the emphasis should end later.

Parsedown isn't currently capable of distinguishing these two cases.

I've previously written algorithms for both resolving precedences of intersecting inlines and also for extending inlines over other ones (essentially in this case it would make the emphasis blind to the end marker inside the link so that it would end at the next available opportunity).

The problem (or perhaps caveat associated) with moving these types of approaches into Parsedown is that one of Parsedown's important features is its speed. These approaches would require pursuing all possible inlines in the line, as well as spending additional time deciding between all the inlines (which involves further parsing when inlines are extended).

More detail As you can see, the resolving algorithm linked above it's not really light on time complexity either – it is a while loop over n inlines, during each cycle of the loop it recurses the entire algorithm over n-1 inlines. At the end of each cycle of the loop it will attempt to extend an inline prior to discarding it by doing more parsing, meaning the the next cycle of the loop would contain n elements also.


The point here I guess is that doing things properly in the sense of being able to match spec in all cases might require including things that would be detrimental to the performance Parsedown currently has. I think @erusev is perhaps the one to make the decision as to whether we should pursue passing CommonMark where performance is likely to be majorly impacted.

PhrozenByte commented 6 years ago

@aidantwoods: I agree and I also agree that this is basically unfixable without significant negative side-effects. Thus both this issue and #107 should be labelled as won't fix and closed.

aidantwoods commented 6 years ago

Thus both this issue and #107 should be labelled as won't fix and closed.

I think I'd agree with closing, since this won't be actively worked on. I'll leave this in the Future milestone because in principle I we'd like to solve the issue – however it's not really something we can do at the moment without the mentioned significant performance degradation. If things change then we might be able to look into this though.

I hope that all makes sense :)