Base->clean() doesn't mitigate XSS/code injection attacks

Rayne commented 9 years ago

Base->clean() doesn't mitigate XSS/code injection attacks as it doesn't remove malicious tag attributes.

$f3->route('GET /test/clean', function(Base $f3) {
    echo $f3->clean('<p><a href="http://127.0.0.1" onclick="alert(\'SURPRISE\');">Link</a></p>', 'a');
});

Generates:

<a href="http://127.0.0.1" onclick="alert('SURPRISE');">Link</a>

For the sake of completeness: Base->scrub() is also affected.

KOTRET commented 9 years ago

avoiding xss is more complex than removing characters. for output always use the esc-function in templates.

There is another bug in this method: if you pass * as 2nd arg, the tags should be removed. They are not, because the method only strips tags if the 2nd arg is not *

Rayne commented 9 years ago

If the * feature would be implemented as mentioned by @KOTRET, it would be superfluous and redundant. The documentation (also on the website) states, that all tags will be removed except the enumerated ones.

It could be possible that the * feature was originally designed to allow all tags and remove only non-printable characters.

ikkez commented 9 years ago

Hi. So your request is to add cleaning for onload and all other HTML events to the clean method?

Rayne commented 9 years ago

No, that would be a little ridiculous. I have different ideas and I think the second one is the best. Please share your ideas, too.

Remove all tag attributes. I expected this behaviour while reading the documentation of Base->clean(). Neat idea but not really documented. Anyway, one should use a XML DOM parser for fine-grained control of allowed tags and attributes instead of a method which strips some tags and removes non-printable characters.
Change the method's documentation from

Remove HTML tags (except those enumerated) and non-printable characters to mitigate XSS/code injection attacks

to

Remove HTML tags (except those enumerated) and non-printable characters and keep the attributes of enumerated tags. This method doesn't mitigate XSS/code injection attacks.

For reference what the PHP documentation says about strip_tags():

Warning This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.

We also shouldn't forget to mention (or remove) the * feature.

F3 v4 could have three methods for cleaning values: one for removing non-enumerated tags and tag attributes, one for removing non-printable characters and one combining both (similar to Base->clean()).

Jiab77 commented 5 years ago

Hi all, first I'd like to thanks the devs for their great work, really impressed! Regarding this method, it should also encode characters to mitigate code passed using encoded strings...

This is what I'm using actually:

function sanitize($string, $html = false, $extra = false) {
    if (isset($string) && !empty($string)) {
        // Main clean
        if ($html === true) {
            $string = htmlentities(strip_tags(trim($string)), ENT_QUOTES, 'UTF-8');
        }
        else {
            $string = htmlspecialchars(strip_tags(trim($string)), ENT_QUOTES, 'UTF-8');
        }

        // Extra clean
        if ($extra === true) {
            // Potentialy dangerous items
            $danger = ["'", '"', '`', '../', '..\\', './', '.\\', 'javascript', ':', ';'];
            // Safe replacement
            $replace = '';
            // Cleaning
            $string = str_replace($danger, $replace, $string);
        }

        return $string;
    }
    else {
        return false;
    }
}

You can decide to encode the given string as html entities and / or remove other extra dangerous characters.

n0nag0n commented 4 years ago

I think this really only applies to fields where you need to keep html for some reason. If you need to keep html, then the clean() method is going to be very underwhelming. There is the HTMLPurifier library available if you need to have a lot of control over what html is used or not used. I've used it and highly recommend it although it is a big cumbersome.

A user (@Dabcorp) in the slack channel created their own xss class for cleaning data. See Repo Here. Still though, if you need fine grained control, HTML Purifier is the way to go.

bcosca / fatfree

Base->clean() doesn't mitigate XSS/code injection attacks #850