I'm playing around with some PHP sanitization libraries and found the following issue in your HTML parser:
The Scanner::peek() method attempts to read beyond the string's end in some cases. From looking at the strack trace it seems like this check is wrong (I think it should be < instead of <= as EOF == strlen($data)). Changing the comparison operator to < makes the warning go away as well.
How to reproduce:
Install the current masterminds/html5 version via composer: composer require masterminds/html5
Run the following php script:
<?php
require "vendor/autoload.php";
use Masterminds\HTML5;
$html5 = new HTML5();
$html = "<form ></span><!--*/'><!--";
$dom = $html5->loadHTML($html);
print $html5->saveHTML($dom);
The warning seems to occur if there are incorrect comments (i.e., trailing and unclosed xml comments) in the input. While this HTML fragment is obviously invalid, your parser is used by several sanitization libraries (e.g., the typo3 one) which have to handle broken HTML.
I do not think this causes any kind of parsing issues, but this still seems to be a bug on your end.
Hello!
I'm playing around with some PHP sanitization libraries and found the following issue in your HTML parser:
The Scanner::peek() method attempts to read beyond the string's end in some cases. From looking at the strack trace it seems like this check is wrong (I think it should be
<
instead of<=
asEOF == strlen($data)
). Changing the comparison operator to<
makes the warning go away as well.How to reproduce:
Install the current masterminds/html5 version via composer:
composer require masterminds/html5
Run the following php script:
The warning seems to occur if there are incorrect comments (i.e., trailing and unclosed xml comments) in the input. While this HTML fragment is obviously invalid, your parser is used by several sanitization libraries (e.g., the typo3 one) which have to handle broken HTML.
I do not think this causes any kind of parsing issues, but this still seems to be a bug on your end.
Cheers!