dkrnl / SimpleXMLReader

Wrapped XMLReader class, for simple SAX-reading of huge xml.
112 stars 43 forks source link

Solution for & symbol in nodes. #10

Closed arnisjuraga closed 6 years ago

arnisjuraga commented 7 years ago

XML reader will fail, if node has not encoded "&" symbol:

<item>
  <manufacturer>Villeroy & Boch</manufacturer>
</item>

SimpleXML reader will just silently break on first appearance of incorrect node, and no error will be generated. If node has &amp; character string, everything works well.

Is it possible to: 1) force reader to Read and Encode nodes with incorrect "&" character? 2) at least - generate some error, if it's reading incorrectly formatted XML file?

dkrnl commented 7 years ago

Hi!

I catch this warning:

Warning:  XMLReader::expand(): parser error : xmlParseEntityRef: no name in \SimpleXMLReader.php on line 245

Maybe warning disabled in your environment?

dkrnl commented 7 years ago

Or try this code: libxml_use_internal_errors(true); ....other code...

arnisjuraga commented 7 years ago

Thanks, warning was logged into application log indeed... I was looking into apache log file before.

For now - I managed to ... bulk search and replace. Works for me on small XML file (~10MB) without performance issues, but could be a problem on huge files.

$content = str_replace("&", "&amp;", $content);
$content = str_replace(["&amp;amp;","&amp;quot;"], ["&amp;","&quot;"], $content);

This, of course, is a dirty (will break other & symbols as well). Is there any other better workaround? Ampersand does not break XML validation, but is not supported by XML reader?

dkrnl commented 7 years ago

Problem inside libxml method XMLReader::expand -- i dont know how fix it. :(