erusev / parsedown

Better Markdown Parser in PHP
https://parsedown.org
MIT License
14.74k stars 1.12k forks source link

URL parameter separator encoded incorrectly #672

Closed adam1010 closed 5 years ago

adam1010 commented 5 years ago

The ampersand delimiter between the parameters is being encoded, generating invalid links. I've tested the below example with the latest release and on the demo site.

[Google](https://google.com/?search=Goo&extra=1&x=b)

HTML produced:

<a href="https://google.com/?search=Goo&amp;extra=1&amp;x=b">Google</a>

HTML that I expected (&amp; => &)

<a href="https://google.com/?search=Goo&extra=1&x=b">Google</a>

I realize there is a setMarkupEscaped(false) option, but I need the rest of the content to be escaped properly.

aidantwoods commented 5 years ago

Is this causing a particular issue? In the context of an attribute &amp; means & when the link is interpreted. If you try clicking the link do the &amp;s not get converted to &s in the resulting navigation?

(Current encoding is also consistent with the CommonMark result)

adam1010 commented 5 years ago

The extra encoding is causing the page not to detect the parameters I'm passing. In particular I've got a signed URL with a hash in it, and when you click the link the page just sees one huge parameter.

So in my example above, my server is just seeing a single parameter search that has the value Goo&amp;extra=1&amp;x=b

aidantwoods commented 5 years ago

Using Parsedown 1.7.1 I tried serving this file served at localhost:8888:

<?php
var_dump($_GET);

require 'Parsedown.php';

$p = new Parsedown;

echo $p->text('[Test](http://localhost:8888/?search=Goo&extra=1&x=b)');

which returns this HTML:

array(0) {
}
<p><a href="http://localhost:8888/?search=Goo&amp;extra=1&amp;x=b">Test</a></p>

If I click the test link then I am directed to http://localhost:8888/?search=Goo&extra=1&x=b and receive this HTML back (i.e. confirming that the &amp; was intepreted as an & in the context of the attribute):

array(3) {
  ["search"]=>
  string(3) "Goo"
  ["extra"]=>
  string(1) "1"
  ["x"]=>
  string(1) "b"
}
<p><a href="http://localhost:8888/?search=Goo&amp;extra=1&amp;x=b">Test</a></p>

Perhaps you could provide a bit more information about your setup? Are you using just Parsedown, or are there extensions too? Is there anything that might be providing additional encoding to the output?

I'm at a bit of a loss as to how you'd get the results you've said though: even if the string ?search=Goo&amp;extra=1&amp;x=b was interpreted as-is and you were directed straight to http://localhost:8888/?search=Goo&amp;extra=1&amp;x=b you'd still have more than one variable—each ampersand would denote a new GET variable, so you'd have something like:

array(3) {
  ["search"]=>
  string(3) "Goo"
  ["amp;extra"]=>
  string(1) "1"
  ["amp;x"]=>
  string(1) "b"
}
<p><a href="http://localhost:8888/?search=Goo&amp;extra=1&amp;x=b">Test</a></p>

even if it happened that the browser wasn't correctly interpreting &amp; as &, or the link was double encoded. In order to have the ampersands appear in the value of the search variable, they'd have to have been URL encoded at some stage to %26.

aidantwoods commented 5 years ago

I realize there is a setMarkupEscaped(false) option, but I need the rest of the content to be escaped properly.

Just thought I'd point out that Parsedown doesn't escape HTML by default, so this is something you'd need to turn on if you want to benefit from it. If you're accepting user-input you probably want safe mode instead of setMarkupEscaped since users could still, for example, link to javascript: URLs even with perfect escaping. Safe mode will address this though.

adam1010 commented 5 years ago

@aidantwoods thanks for running that test. I'm actually using Parsedown to generate emails so it's within the email client that they're clicking the link that's taking them to the wrong address (without converting the & back). So it sounds like maybe the email client is to blame for not converting & back into & before redirecting the user to that URL.

I'll go ahead and close this ticket for now. Thanks again!

mrx234 commented 5 years ago

This may or may not be an issue but IMHO &s inside links should not be escaped to &amp; in order to be save with any client providing clickable links to users!

Therefore I suggest to implement some kind of attribute detection inside the Parsedown::element() method to ensure that href and src attributes (and maybe others) won’t be touched.

What about this (to be inserted just before and replacing line ~ 1474):

if (preg_match('/(href|src)/i', $name) === 1) {
    // do not escape!
    $markup .= ' '.$name.'="'.$value.'"';
} else {
    $markup .= ' '.$name.'="'.self::escape($value).'"';
}
aidantwoods commented 5 years ago

&amp; is a correct way to represent an ampersand in HTML attributes though, a literal & only means ampersand if it happens not to look like part of an HTML entity. The HTML5 spec even goes so far as to prohibit the inclusion of ambiguous ampersands in an attribute value, so forgoing encoding ampersands would cause invalid HTML to be generated in some cases.

The above also conditionally removes encoding of quotes and angle brackets (in addition to entities) which would be a security regression (see #495). Not encoding ampersands also opens up all sorts of tricks like using &colon; or the numeric counterpart to represent a colon in a scheme (which would make link based XSS almost impossible to defend against given browser specific behaviour around "fixing" HTML in this area), as well as white space being permitted: e.g. jav\tascript: or j\0av\0a\tscrip\tt: is interpreted as a JavaScript scheme, you could even combine these tricks to encode the colon too. Encoding ensures that we have "what you see is what you get" behaviour with regard to the original string and how it is interpreted by the browser (since the colon is necessary to construct a scheme).

mikael-s commented 4 years ago

I don't know what web browser your are using but on my side that URL is not working neither with Firefox 66 nor with Chromium 76: https://www.google.com/search?hl=en&amp;q=ampersand.

This is an issue as this is breaking links.