erusev / parsedown

Better Markdown Parser in PHP
https://parsedown.org
MIT License
14.74k stars 1.12k forks source link

BUG: InlineURL in <a> tag creates double link #630

Closed doiftrue closed 6 years ago

doiftrue commented 6 years ago

How correctly fix this bug: if we have <a> tag with URL anchor parser works incorrectly

Lets pass test text here: http://parsedown.org/demo

some text <a href="http://domen.kz/seo/">http://domen.kz/seo/</a>.

doiftrue commented 6 years ago

I found this solution: add (?!<\/a>) in regular of method inlineUrl() .

/\bhttps?+:[\/]{2}[^\s<]+\b\/*+/ui
/\bhttps?+:[\/]{2}[^\s<]+\b\/*+(?!<\/a>)/ui
aidantwoods commented 6 years ago

I believe that GitHub produces a similar result? e.g. if you check the HTML output of the following:

http://domen.kz/seo/

This occurs because "inline HTML" (Raw HTML in the spec) is defined to be the tag itself (but not its contents), and so the input is seen as two separate bits of raw HTML with an autolink in the middle (but not inside since the parser doesn't know that).

This to say that the spec says to parse the input

<a href="http://domen.kz/seo/">http://domen.kz/seo/</a>

as the following:

[raw HTML][autolink][raw HTML]

I think while it's perhaps not ideal that the autolink is recognised in this case, there are certainly useful examples of this behaviour, e.g.:

<a href="http://domen.kz/seo/">*hooray!*</a>

which parses like:

[raw HTML][emphasised text][raw HTML]
aidantwoods commented 6 years ago

Closing since I believe the current behaviour is in-line with the spec: https://github.github.com/gfm/#autolinks-extension-