Open tholu opened 10 years ago
Thanks for reporting this. I'm actually not sure what the right behavior is -- it's probably up for debate -- but someone should look at this at some point.
Note: This issue breaks http://php-java-bridge.sourceforge.net/pjb/
PHP calls start_elem
after it has read the whole element (e.g. <F></F>
works with both, HHVM and PHP), while HHVM calls start_elem
immediately after reading <F>
.
After some digging, it seems that HHVM's behavior is consistent with the underlying libexpat. See this example
#include <expat.h>
#include <stdio.h>
#include <string.h>
void start_element(void *data, const char *element, const char **attribute) {
printf("<%s>\n", element);
}
void end_element(void *data, const char *element) {
printf("</%s>\n", element);
}
int main(void) {
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start_element, end_element);
char buff[] = "<F>";
XML_Parse(parser, buff, strlen(buff), XML_TRUE);
XML_ParserFree(parser);
return 0;
}
Python example
import xml.parsers.expat
def start_element(name, attrs):
print '<%s>' % name;
def end_element(name):
print '</%s>' % name;
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.Parse("<F>")
Both will output
This is likely to be a bug in php5. The following is done in PHP5.
<?php
function start_elem($parser,$name,$attribs) {
echo "<$name>";
}
function end_elem($parser,$name)
{
echo "</$name>";
}
$parser=xml_parser_create();
xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0);
xml_set_element_handler($parser,"start_elem","end_elem");
$buf = '<F>';
echo xml_parse($parser,$buf,strlen($buf)==0);
Will output nothing 1
. But
$buf = '<Foo>';
will output <Foo>1
, just like in HHVM.
Can you give some details of how this breaks pjb? It seems that one should not rely on PHP5's this behavior at all...
My php version
zeng-mbp:php-5.5.10 zeng$ php --version
PHP 5.4.24 (cli) (built: Jan 19 2014 21:32:15)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies
Thanks for digging deeper. I have the following test script with PJB (Java.inc):
<?php
$query = "test";
require_once('Java.inc');
$escapedQuery = java_values(java('org.apache.lucene.queryParser.QueryParser')->escape($query));
var_dump($escapedQuery);
This works with PHP, while calling with HHVM gives
HipHop Fatal error: protocol error: <O v="1" m="org.apache.lucene.queryParser.QueryParser" p="O" n="F"/>,no element found at col 3. Check the back end log for OutOfMemoryErrors. in <path>/Java.inc on line 879
It has nothing to do with OutOfMemory errors of course. I tracked this down to the different behaviour of xml_set_element_handler
in PHP and HHVM. I don't know if there are more problems if this is fixed, though.
I've found the same bug, the start & stop element functions declared inside of: xml_set_element_handler( $parser, "startElement", "stopElement" ); never get called (tried adding echo "TEST";) to confirm this.
SORRY can't figure out how to show code, so adding images instead, tried < code > & [ code ].
Guide for formatting: https://help.github.com/articles/github-flavored-markdown
Put your code in ``` tags
Ooops ignore that, it appears that the $element_name variables are all capitalised :)
Perhaps this comment on my corresponding thread from StackOverflow helps:
POSIX textfiles are expected to have a line-ending. in your buffer that line-ending is missing which is why the element that is opened but never closed before reaching EOF (EOB) is the cutted from the input sequence as data is missing. you could also just append a space or another different character that would shift the internal state of the parser at least by one character making it aware that your string should be an element. Your input BTW is not XML. You probably would like to make it self-closing like
which is supported by that parser. – hakre
http://stackoverflow.com/questions/21389028/php-xml-parse-and-xml-set-element-handler
I think this boils down to HHVM using libexpat vs PHP using libxml2.
@tholu I believe you are correct in your final assessment. I am going to keep this item open as a wishlist item. If you would like to reimplement our xml libraries using libxml2, we would certainly consider a pull request :)
To reproduce, use the following script (e.g. xml_parse.php):
then compare
Output:
1
with
Run with r.
Output:
<F>1
It seems that start_elem() is somehow not called in the Zend implementation (maybe a bug there or is this intended?).