Rct567 / DomQuery

PHP library for easy 'jQuery like' DOM traversing and manipulation.
MIT License
130 stars 38 forks source link

UTF-8 Encoding not applied if content contains inline svg graphics #46

Closed heldchen closed 5 months ago

heldchen commented 5 months ago

when scraping a website that contains inlined svg graphics, the loadContent() function fails to apply the correct encoding as the <?xml version="1.0" encoding="UTF-8"?> of the inlined graphic is preventing adding of encoding header in https://github.com/Rct567/DomQuery/blob/663dba005225fbe18f5eff38a796801a4af79def/src/Rct567/DomQuery/DomQueryNodes.php#L358

a quick fix would be to just rely on the already set $this->xml_print_pi property:

        if (!$this->xml_print_pi && $encoding) {
            $content = '<?xml encoding="'.$encoding.'">'.$content; // add pi node to make libxml use the correct encoding
            $xml_pi_node_added = true;
        }
heldchen commented 5 months ago

thanks alot for the fix!