Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.56k stars 115 forks source link

DOMCdataSection vs DOMText #175

Closed bytestream closed 4 years ago

bytestream commented 4 years ago

I've come across a discrepancy between DOMDocument and this lib:

DOMDocument returns DOMCdataSection for the <style> content

$dom = new \DOMDocument;
$dom->loadHTML('<!doctype html>
    <html lang="en">
      <head>
        <style type="text/css"><!--
div {}
--></style>
      </head>
      <body>
        <div id="foo" class="bar baz">foo bar baz</div>
      </body>
    </html>');

var_dump( $dom->getElementsByTagName('style')->item(0)->childNodes->item(0) ); die;

masterminds/html5 return DOMText for the <style> content

$dom = $this->html5->loadHTML(
            '<!doctype html>
    <html lang="en">
      <head>
        <style type="text/css"><!--
div {}
--></style>
      </head>
      <body>
        <div id="foo" class="bar baz">foo bar baz</div>
      </body>
    </html>');

var_dump( $dom->getElementsByTagName('style')->item(0)->childNodes->item(0) ); die;

DOMDocument treats this has DOMCdataSection

In HTML, the content of the script and style elements is treated as if it were CDATA, so that & and < are not special except when they occur as the end tag to close the element.

I think DOMText and probably DOMCdataSection are incorrect. Although, mimicking DOMDocument is probably what's intended?

At the very least it should be tokenized as DOMComment.

However, when using the