Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.55k stars 114 forks source link

Incorrect parsing of self closing tr/td #239

Closed stefanfisk closed 10 months ago

stefanfisk commented 11 months ago

Parsing self closing <tr> and <td> in combination results in the rows being nested instead of siblings.

Here's a minimal test case:

<?php

use Masterminds\HTML5;
use PHPUnit\Framework\TestCase;

class SelfClosingTableTest extends TestCase
{
    public function testHtml5Spec(): void
    {
        $html5 = new HTML5();

        $html = '<table><tr><td>A<tr><td>B</table>';

        $doc = $html5->loadHTMLFragment($html);

        $this->assertSame(
            '<table><tr><td>A</td></tr><tr><td>B</td></tr></table>',
            $html5->saveHTML($doc),
        );
    }
}

Which outputs:

Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'<table><tr><td>A</td></tr><tr><td>B</td></tr></table>'
+'<table><tr><td>A<tr><td>B</td></tr></td></tr></table>'

I first encountered this issue when parsing the HTML spec 😸 https://html.spec.whatwg.org/multipage/indices.html#attributes-1

goetas commented 11 months ago

:(

when i see HTML i really hate it. xml was so good and consistent.

goetas commented 11 months ago

Could you please test https://github.com/Masterminds/html5-php/issues/239 ?

stefanfisk commented 10 months ago

@goetas AFAICT the document is properly parsed now. Thanks for the fix!