dkrnl / SimpleXMLReader

Wrapped XMLReader class, for simple SAX-reading of huge xml.
112 stars 43 forks source link

XPath doesn't appear to work correctly #12

Closed Synchro closed 5 years ago

Synchro commented 5 years ago

Thank you for this class - it does exactly what I'm looking for - but unfortunately it fails to match some very simple XPath patterns, in particular anything that matches more than a single element. If I test the same patterns in other XPath environments, they work as expected. I note that the docs say that this class matches "simple" XPaths, but doesn't expand on what the limitations are. Here's an example:

Here is an XML file (saved as data.xml):

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<gmdata>
  <header>
    <goldmine_version>2017.1.0.377</goldmine_version>
    <gmdbdef>
      <gmdbfld flddbfname="NOTES" fldname="CREATEDDATE" fldtype="datetime" fldlen="8"></gmdbfld>
      <gmdbfld flddbfname="NOTES" fldname="USERID" fldtype="character" fldlen="8"></gmdbfld>
      <gmdbfld flddbfname="NOTES" fldname="MODIFIEDDATE" fldtype="datetime" fldlen="8"></gmdbfld>
    </gmdbdef>
  </header>
</gmdata>

Here is my script:

<?php
require 'vendor/autoload.php';

$reader = new SimpleXMLReader;
$reader->open(__DIR__.'/data.xml');
$gmversion = 'Unknown';
$fielddefs = [];
$reader->registerCallback(
    '/gmdata/header/goldmine_version',
    function ($reader) use (&$gmversion) {
        $element = $reader->expandSimpleXml();
        $gmversion = (string)$element;
    }
);
$reader->registerCallback(
    '/gmdata/header/gmdbdef/gmdbfld',
    function ($reader) use (&$fielddefs) {
        //gmdbfld elements expected to contain properties:
        //flddbfname, fldname, fldtype, fldlen
        $element = $reader->expandSimpleXml();
        $attrs = $element->attributes();
        $fielddef = [];
        foreach ($attrs as $name => $value) {
            $fielddef[$name] = $value;
        }
        $fielddefs[$fielddef['flddbfname']][] = [
            'table'  => $fielddef['flddbfname'],
            'name'   => $fielddef['fldname'],
            'type'   => $fielddef['fldtype'],
            'length' => $fielddef['fldlen'],
        ];
    }
);
$reader->parse();
$reader->close();

echo 'Goldmine version ', $gmversion;
var_dump($fielddefs);

Only the first of these callbacks triggers - /gmdata/header/goldmine_version matches a single element and works as expected. The second pattern never matches. Similar patterns like /gmdata/header/gmdbdef do not match either, yet they do in every other XPath env I've tried them in.

Is this a bug? Or a limitation of using XPath with this class?

dkrnl commented 5 years ago

Its no bug, limitation of using XPath.