bcit-ci / CodeIgniter

Open Source PHP Framework (originally from EllisLab)
https://codeigniter.com/
MIT License
18.27k stars 7.61k forks source link

[security] xss_clean() doesn't #1705

Closed sourcejedi closed 10 years ago

sourcejedi commented 12 years ago
<a href="j&#x26;#x26#x41;vascript:alert%252831337%2529">Hello</a>

"xss_clean() is a fantastic security measure, when used correctly... consider a commenting system or forum that allows "safe" HTML tags. html_escape() and htmlspecialchars() are inappropriate, whereas xss_clean() does the job." (See issue #470). The input above shows that this recommended use case can be broken. xss_clean() is supposed to prevent javascript links, and scans for functions like alert() as a fallback, but this input defeats it. So the current implementation of xss_clean() should be considered buggy.

For a practical example of this use case, see the ExpressionEngine forum used for the CodeIgniter community forum. It relies solely on xss_clean() to escape the subject line of private messages.


xss_clean() effectively needs to parse html, i.e. match the browser's interpretation, so it can identify javascript. HTML cannot be parsed with regular expressions, because it is not a regular language. Any version of xss_clean() that tries to parse it using regular expressions will have holes. Ask someone who studied computer science.

4000 active users on StackOverflow confirm that regular expressions should not be used to parse html. (This answer has been locked, which explains why the number of votes is not higher). http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Practical experience confirms that all regex-based XSS "cleaners" have holes. http://blog.astrumfutura.com/2011/03/regex-html-sanitisation-off-with-its-head/. CodeIgniter should be aware of this after fixing multiple vulnerabilities last year. http://blog.astrumfutura.com/2011/05/codeigniter-2-0-2-cross-site-scripting-xss-fixes-and-recommendations/.

"Cleaning" a "blacklist" of "evil" markup is always vulnerable to new language extensions. xss_clean() is ignorant about long-standing extensions such as inline SVG. See section 3.3 of http://www.ei.rub.de/media/hgi/veroeffentlichungen/2011/10/19/svgSecurity-ccs11.pdf.

Even if xss_clean() was as effective as htmlpurify.org, the CodeIgniter API would still be wrong to support it as an input filter. In many cases, data should be escaped at the point of use. This is because the type of escaping desired will vary for different contexts. (See below for a concrete example). Escaping at the point of use makes this connection explicit. The "global" xss_clean option doesn't make sense, just as PHP's gpc_magic_quotes didn't make sense. Providing xss_clean() as an option in form validation is not explicit enough, considering that the number of cases where it would be appropriate are actually relatively low. I.e. I stipulate that it might make sense for the content of posts on an html forum - but it would not make sense for usernames. The forum should be able to control what happens when you click on the username attached to a post; it would usually be considered undesirable for the user to be able to override it by adding an anchor element of their choice.


The most distressing problem is that the CodeIgniter documentation doesn't help developers use xss_clean "correctly" even as defined by the CodeIgniter developers. There is no discussion of what it actually does, i.e. what context its output is safe to use in.

ci-Bonfire used xss_clean() output inside the "value" attribute of input elements. IOW, they relied on xss_clean() to prevent XSS in a standard form. But the output of xss_clean() is not safe to use inside attribute values, because it's allowed to contain quote characters.

alexbilbie commented 12 years ago

I'm just putting this here for future reference but our unit tests should cover all of these test cases http://ha.ckers.org/xss.html

sourcejedi commented 12 years ago

I note your caveat. Just in case a casual reader (not you) misses it, I repeat my disagreement. This is insanity. The priority for xss_clean() should be to deprecate it, provide correct documentation, and remove it. But if you decide to go ahead...


I agree the XSS cheat sheet coulld be used to test for regressions, seeing as xss_clean() was originally written from it.

You'd also want to make sure that any rewrite didn't regress w.r.t the examples in the 2012 CVE. http://seclists.org/fulldisclosure/2012/Jul/311.

To cover the input I gave in this bug, you would want an additional test case. This input starts off "safe", but after xss_clean() alters it it becomes "unsafe". The XSS cheat sheet doesn't cover anything like this. I think the cheatsheet is only concerned with cases where the input is "unsafe" to start off with, but the "cleaner" doesn't notice.

You should also be clear that this would only bring you up to 2010 or earlier. The XSS cheat sheet is ancient (refers to "FF2.0") and no longer updated. http://security.stackexchange.com/questions/164/new-xss-cheatsheet.

They suggest http://heideri.ch/jso/, "HTML5 Security Cheatsheet". (Personally I found it a bit cryptic and un-prioritised for this purpose. E.g. the advice for the second item to block "autofocus".)

They also recommend https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet. This is a good citation for my advice not to use something like xss_clean() for input escaping.

timw4mail commented 12 years ago

I think we need to first consider the various contexts for each XSS prevention method that we should use.

At the vary least, we have to worry about:

  1. Tags
  2. Attributes

From that standpoint, we need methods to filter XSS for each of these contexts.

Then we have to look at the sub-contexts, such as scripting in attributes.

Just saying that the current method does not work is not very productive.

sourcejedi commented 12 years ago

Ah... I see I didn't understand the Bonfire form bug. I should take that part back. The vulnerable code was

value="<?php echo isset($user) ? $user->display_name : set_value('display_name') ?>"

Hopefully this was a logic error (i.e. the precedence of $user->display_name over $_POST['display_name'] was wrong, considering the case where form validation fails), and the vulnerability could be removed by fixing the logic

value="<?php echo set_value('display_name', isset($user) ? $user->display_name : '') ?>"

That said, there were a couple of places outside form values where attributes were not escaped correctly. I still feel this shows a flaw in the CodeIgniter documentation and API, that encourages widespread use of xss_clean(), when it is only intended for specific cases.


@timw4mail: yes, this was something of a flame! At least I put the proof-of-exploit code first, along with the explanation that you should not expect to be able to fix the current code as long as it tries to parse using regexes. I'm sure you'd rather know about the vulnerability than not.

If you're asking why I drag in all the other issues, that's a fair question. If people rewrite xss_clean() to block my input and write up all the regression tests etc, but later decide that a blacklist-based filtering API is a bad idea, that could be a lot of wasted effort. I am optimistic that, in coming to the third round of penetrate-and-patch, people will eventually realize that a blacklist-based filtering API is a bad idea. (And/or that the CodeIgniter community has shown that it does not have the resources to build and maintain a HTML parser, while keeping it accurate and efficient).

If you're asking for a constructive proposal, I think the implication is obvious. (And pretty much the same as the two more formal pen-testers said). Most data should be escaped at the point of output, like set_value() / form_prep(). For the specific cases where xss_clean was actually intended to be used, recommend replacing it with htmlpurifier.org. Deprecate all the xss_clean APIs, explaining why and the alternatives above. As soon as possible, remove them altogether.

alexbilbie commented 12 years ago

Just of note, it looks like the Yii framework use htmlpurifier as their XSS cleanser

sourcejedi commented 11 years ago

brian978 pointed out the proof-of-concept I posted here didn't work.

I knew I managed to bypass it on the EE forums, so I re-tested with 2.1.3 and re-read xss_clean() in detail. The proof-of-concept on the first line should work now.

sourcejedi commented 11 years ago

PR #2049 by brian978 would now patch three different bypasses, including the first one I found and posted on this issue.

  1. <a href="j&#x26;#x26#x41;vascript:alert%252831337%2529">Hello</a>
  2. <a <!-- href="j&#x61;vascript:&#x61;lert&#x28;31337&#x29;;">Hello</a>
  3. <img src="http://www.w3schools.com/tags/planets.gif" width="145" height="126" alt="Planets" usemap="#planetmap">
    <map name="planetmap">
       <area shape="rect" coords="0,0,145,126" a-=">" href="j&#x61;vascript:&#x61;lert(-1)">
    </map>

For entertainment purposes: The next issue is a bit different. Maybe reaching a bit, so we'd better get on to 5 as well.

  1. `
narfbg commented 10 years ago

@sourcejedi I just added <button> to that list, as you suggested: 4356806dc0298363217694d727db9cad84a073e0

Could you elaborate and/or propose a fix for the 'form' attribute and that 'data:' thing?

sourcejedi commented 10 years ago

Banning <button> addresses the concern I mentioned about <button type="submit" form="target">.

I'm not as interested in enumerating issues now. But since you mention it, the form attribute could still be used to add values to a form on the same page, when used with other submittable elements. Ooh, or you might disable one of the forms, by adding a required element to it, with <select form=X required

Surely form should be on the list of banned attributes?

For data:, actually I don't know that it's exploitable. Even if base64 was permitted. So I can't think of an example.

narfbg commented 10 years ago

Well, the thing is ... there is no list of banned attributes from what I see. Shouldn't banning the submittable elements be sufficient? Most of them are banned anyway, I'd just have to check and add any missing ones.

On 'data:' - ok, I can consider this a non-issue then?

I'm sorry to involve you in this again if you're not interested, but I'm obviously no XSS expert and you seemed to be enthusiastic about this.

narfbg commented 10 years ago

Banning 'form' & 'xlink:href' attributes: 25ca23533e3efe59754145c91037fae171fb4862

narfbg commented 10 years ago

Banning the rest of the submittable elements: c715b22eb153aa702b07a158357ee2b13a24cf67

Please do leave a comment if you'd want to explain what you meant earlier with data:.

sourcejedi commented 10 years ago

@narfbg you're no bother! I meant, I still expect there's other stuff, but I don't expect I'll find it (all).

Yes, I think data: is a non-issue. Don't know why it's filtered in the first place.

I think your blocking the form attribute was a good idea, in case there are more submittable elements in the future.

sourcejedi commented 10 years ago

...having disclaimed that, I had a look at the ban on xmlns and found this

<a html:onclick="alert(1)" href=google.com>Innocent link</a>

which is being "cleaned" as

<a onclick="alert&#40;1&#41;" href="http://google.com">Innocent link</a>

and runs very nicely, thank you.

narfbg commented 10 years ago

Even after adf3bde5f8a196013acc615e5bfeedd0ef6417b8 ?

sourcejedi commented 10 years ago

...having read the other bug, I concede I was wrong, data: uris do need to be filtered out. Hopefully they found the easiest remaining bypasses there.

sourcejedi commented 10 years ago

Ah! adf3bde fixes that, yes.

I don't remember the effect working quite so aggressively, but I'm probably wrong.

<div sons=""> => <div A>

i.e. attributes with "on" in the middle get mangled, not just those with "on" at the start. Does that block anything we want to let through?

sourcejedi commented 10 years ago

Ok, grepping the table of HTML5 attributes, we're ok.

action content contenteditable contextmenu controls dropzone formaction icon readonly

narfbg commented 10 years ago

That should do it: b69103e8ab0c646d01f5e97ef6a255293de1e60e

narfbg commented 10 years ago

... or rather dbd999f33374f6541f167e3d77a3e80a991b301a