Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.55k stars 114 forks source link

Traverser::node() does not handle entity references #244

Open longwave opened 7 months ago

longwave commented 7 months ago

If I manually modify a DOM document and add an entity reference, HTML5::saveHTML() does not give the same result as DOMDocument::saveHTML():

$html5 = new Masterminds\HTML5(['disable_html_ns' => TRUE]);
$dom = $html5->loadHTML('<body>');
$node = $dom->getElementsByTagName('body')->item(0);

$node->appendChild($dom->createElement('span', 'Identit&eacute;'));

print $dom->saveHTML() . "\n";
print $html5->saveHTML($dom) . "\n";

outputs

<!DOCTYPE html>
<html><body><span>Identit&eacute;</span></body></html>

<!DOCTYPE html>
<html><body><span>Identit</span></body></html>

This was reported in the Drupal project, which has recently switched to using this library instead of using DOMDocument to parse and serialize HTML: https://www.drupal.org/project/drupal/issues/3416204

This is because Traverser::node() does not handle XML_ENTITY_REF_NODE. Should the switch statement and rules class be extended to support this case?

jcnventura commented 2 months ago

Any news on this?

goetas commented 1 month ago

hm, interesting. are you willing to provide a fix for this?