Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.55k stars 114 forks source link

multiple <html> <body> #246

Open ducktype opened 7 months ago

ducktype commented 7 months ago

Parsing html with multiple html and body tags is inconsistent with the resulting dom tree compared to html5 browsers

akne1234 commented 2 months ago

Same problem here...

My Sample HTML: $str = '<h1>Hello Dompdf</h1><div><span>nice</span></div>'

Correct test with DomDocument created by my own: $dom = new DOMDocument("1.0", $encoding); $dom->preserveWhiteSpace = true; $dom->loadHTML($str); echo htmlspecialchars($dom->saveHTML());

=> Correct result: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"></head><body><h1>Hello Dompdf</h1><div><span>nice</span></div></body></html>

Problem with your HTML5 class: $dom = $html5->loadHTML($str); ... $doc->loadHTML($html5->saveHTML($dom), LIBXML_NOWARNING | LIBXML_NOERROR);

=> Wrong result with multiple html tags: <!DOCTYPE html> <html><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"><html><h1>Hello Dompdf<html><div><html><span>nice</span></html></div></html></h1></html></html>

Maybe html5 loadHTML creates wrong DomDocument-Object?