Bad HTML filtering regexp
Matching HTML tags using regular expressions is hard to do right, and can easily lead to security issues.
This regular expression only parses --> and not --!> as a HTML comment end tag.
Commit SHA: 459cd0fa4ebaaec6cee08c99e42c2050d46521c4
Line Number: 5631
Tool Name: CodeQL
Mitigation: # Bad HTML filtering regexp
It is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues.
Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.
Recommendation
Use a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation.
Example
The following example attempts to filters out all <script> tags.
function filterScript(html) {
var scriptRegex = /<script\b[^>]*>([\s\S]*?)<\/script>/gi;
var match;
while ((match = scriptRegex.exec(html)) !== null) {
html = html.replace(match[0], match[1]);
}
return html;
}
The above sanitizer does not filter out all <script> tags. Browsers will not only accept </script> as script end tags, but also tags such as </script foo="bar"> even though it is a parser error. This means that an attack string such as <script>alert(1)</script foo="bar"> will not be filtered by the function, and alert(1) will be executed by a browser if the string is rendered as HTML.
Other corner cases include that HTML comments can end with --!>, and that HTML tag names can contain upper case characters.
Bad HTML filtering regexp Matching HTML tags using regular expressions is hard to do right, and can easily lead to security issues. This regular expression only parses --> and not --!> as a HTML comment end tag. Commit SHA: 459cd0fa4ebaaec6cee08c99e42c2050d46521c4 Line Number: 5631 Tool Name: CodeQL
File Path: public/public/vendor/lightbox2/dist/js/lightbox-plus-jquery.js:5631
Mitigation: # Bad HTML filtering regexp It is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues.
Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.
Recommendation
Use a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation.
Example
The following example attempts to filters out all
<script>
tags.The above sanitizer does not filter out all
<script>
tags. Browsers will not only accept</script>
as script end tags, but also tags such as</script foo="bar">
even though it is a parser error. This means that an attack string such as<script>alert(1)</script foo="bar">
will not be filtered by the function, andalert(1)
will be executed by a browser if the string is rendered as HTML.Other corner cases include that HTML comments can end with
--!>
, and that HTML tag names can contain upper case characters.References
Impact: See Description
Finding Id : 129139785
Tool Finding Id: 53