Closed Frenzie closed 8 months ago
Hm, I was just reading more and realized this may not be ideal, though at first glance the output looked okay:
For an example:
<h1>Foo</h1><p>bar</p>
will end up as:
<h1>Foo<p>bar</p></h1>
Easiest workaround is adding root tag yourself and stripping it later:
$html->loadHTML('<html>' . $content .'</html>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$content = str_replace(array('<html>','</html>') , '' , $html->saveHTML());
I have not looked precisely, but a typical way is to saveHTML($node)
. Try to search our codebase for ->saveHTML(
for some examples
That's the first thing I tried, as $dom->getElementsByTagName('body')->item(0)->saveHTML()
.
But I see what you're referring to is actually a for loop appending to a string. I had considered that, but it felt hacky.
I think it makes a difference in behaviour to pass the node inside the saveHTML($node)
function
And no, without loop. See e.g.
Ah, in that case presumably it should simply be something like:
$output = $dom->getElementsByTagName('body')->item(0);
return $dom->saveHTML($output);
I'll test that tomorrow (if I find the time).
PS Here are the other results. I hadn't notice the third was slightly different, whoops. https://github.com/FreshRSS/FreshRSS/blob/da43fff43713daadbf13d05557a2ba14d5166b11/app/Models/Feed.php#L751-L756 https://github.com/FreshRSS/FreshRSS/blob/da43fff43713daadbf13d05557a2ba14d5166b11/app/Models/Entry.php#L727-L743
For reference, this is the idea I played with first yesterday:
$body = $doc->getElementsByTagName('body')->item(0);
$output = '';
foreach ($body->childNodes as $node) {
$output .= $doc->saveHTML($node);
}
return $output;
The sample uses regex to effect the desired result. I will do so as well to limit new string creation. https://github.com/FreshRSS/FreshRSS/blob/da43fff43713daadbf13d05557a2ba14d5166b11/lib/SimplePie/SimplePie/Sanitize.php#L470-L474
In reply to https://github.com/FreshRSS/Extensions/issues/206#issuecomment-1976649551.
With special thanks to this comment: https://www.php.net/manual/en/domdocument.savehtml.php#119767