Open Rayne opened 9 years ago
avoiding xss is more complex than removing characters. for output always use the esc-function in templates.
There is another bug in this method: if you pass *
as 2nd arg, the tags should be removed. They are not, because the method only strips tags if the 2nd arg is not *
If the *
feature would be implemented as mentioned by @KOTRET, it would be superfluous and redundant. The documentation (also on the website) states, that all tags will be removed except the enumerated ones.
It could be possible that the *
feature was originally designed to allow all tags and remove only non-printable characters.
Hi. So your request is to add cleaning for onload
and all other HTML events to the clean
method?
No, that would be a little ridiculous. I have different ideas and I think the second one is the best. Please share your ideas, too.
Base->clean()
. Neat idea but not really documented. Anyway, one should use a XML DOM parser for fine-grained control of allowed tags and attributes instead of a method which strips some tags and removes non-printable characters.Change the method's documentation from
Remove HTML tags (except those enumerated) and non-printable characters to mitigate XSS/code injection attacks
to
Remove HTML tags (except those enumerated) and non-printable characters and keep the attributes of enumerated tags. This method doesn't mitigate XSS/code injection attacks.
For reference what the PHP documentation says about strip_tags()
:
Warning This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.
We also shouldn't forget to mention (or remove) the *
feature.
F3 v4
could have three methods for cleaning values: one for removing non-enumerated tags and tag attributes, one for removing non-printable characters and one combining both (similar to Base->clean()
).
Hi all, first I'd like to thanks the devs for their great work, really impressed! Regarding this method, it should also encode characters to mitigate code passed using encoded strings...
This is what I'm using actually:
function sanitize($string, $html = false, $extra = false) {
if (isset($string) && !empty($string)) {
// Main clean
if ($html === true) {
$string = htmlentities(strip_tags(trim($string)), ENT_QUOTES, 'UTF-8');
}
else {
$string = htmlspecialchars(strip_tags(trim($string)), ENT_QUOTES, 'UTF-8');
}
// Extra clean
if ($extra === true) {
// Potentialy dangerous items
$danger = ["'", '"', '`', '../', '..\\', './', '.\\', 'javascript', ':', ';'];
// Safe replacement
$replace = '';
// Cleaning
$string = str_replace($danger, $replace, $string);
}
return $string;
}
else {
return false;
}
}
You can decide to encode the given string as html entities and / or remove other extra dangerous characters.
I think this really only applies to fields where you need to keep html for some reason. If you need to keep html, then the clean() method is going to be very underwhelming. There is the HTMLPurifier library available if you need to have a lot of control over what html is used or not used. I've used it and highly recommend it although it is a big cumbersome.
A user (@Dabcorp) in the slack channel created their own xss class for cleaning data. See Repo Here. Still though, if you need fine grained control, HTML Purifier is the way to go.
Base->clean()
doesn't mitigate XSS/code injection attacks as it doesn't remove malicious tag attributes.Generates:
For the sake of completeness:
Base->scrub()
is also affected.