Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.55k stars 114 forks source link

Uninitialized string offset in src/HTML5/Parser/Scanner.php:108 #215

Closed leeN closed 2 years ago

leeN commented 2 years ago

Hello!

I'm playing around with some PHP sanitization libraries and found the following issue in your HTML parser:

The Scanner::peek() method attempts to read beyond the string's end in some cases. From looking at the strack trace it seems like this check is wrong (I think it should be < instead of <= as EOF == strlen($data)). Changing the comparison operator to < makes the warning go away as well.

How to reproduce:

Install the current masterminds/html5 version via composer: composer require masterminds/html5

Run the following php script:

<?php
require "vendor/autoload.php";

use Masterminds\HTML5;
$html5 = new HTML5();
$html = "<form ></span><!--*/'><!--";
$dom = $html5->loadHTML($html);

print $html5->saveHTML($dom);

The warning seems to occur if there are incorrect comments (i.e., trailing and unclosed xml comments) in the input. While this HTML fragment is obviously invalid, your parser is used by several sanitization libraries (e.g., the typo3 one) which have to handle broken HTML.

I do not think this causes any kind of parsing issues, but this still seems to be a bug on your end.

Cheers!