Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.55k stars 114 forks source link

Parser remove the single < (less than) character #249

Closed touhidurabir closed 2 months ago

touhidurabir commented 2 months ago

The HTML parser removes the single use of < character form the given string . for example

<?php

use Masterminds\HTML5;

$html = '<img src="invalid-url" onerror="alert(\'XSS Attack prefix\')" /> 2 > 1 & 3 < 5 and some more text';

// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($html);

// Render it as HTML5:
print $html5->saveHTML($dom);

the printing of $html5->saveHTML($dom) should return

<!DOCTYPE html>
<html><img src="invalid-url" onerror="alert('XSS Attack prefix')"> 2 &gt; 1 &amp; 3  &lt; 5 and some more text</html>

but instead it returns

<!DOCTYPE html>
<html><img src="invalid-url" onerror="alert('XSS Attack prefix')"> 2 &gt; 1 &amp; 3  5 and some more text</html>

See the missing encoded &lt; of < character .

touhidurabir commented 2 months ago

Sorry for duplicate post as for some reason at the first submission, github showed Something went wrong and thought it did not saved . It can be removed/closed in prefer to https://github.com/Masterminds/html5-php/issues/250