I downloaded the XML dump data from Wikipedia as part of our experiment and was trying to parse it using the LibExpat module
It was throwing a parse error for the xml file. (This is a small part of the entire dump)
To ensure that the file was valid, I verified it by parsing it in Ruby(nokogiri) and Python(ElementTree), and it passed for both.
The errors I received were:
ERROR: "ErrorException(\"Error parsing document : 0\"), no element found, 0x0000396b, 17, 1924369"
in xp_parse at /Users/pulkitb/.julia/LibExpat/src/LibExpat.jl:274
(This is for a 15K line XML out of the whole fragment)
ERROR: "ErrorException(\"Error parsing document : 0\"), unclosed token, 0x00000093, 3, 9857"
in xp_parse at /Users/pulkitb/.julia/LibExpat/src/LibExpat.jl:274
(This is for one element out of the whole xml file. It is the element from the above file that was throwing an error. )
Since, github doesn't allow attaching XMLs, I have copied the xml file (second error) http://pastebin.com/A1puALyw
From what I could notice, the error is directly thrown out of libexpats parse function, so I'm not sure if it can be fixed here.
I downloaded the XML dump data from Wikipedia as part of our experiment and was trying to parse it using the LibExpat module
It was throwing a parse error for the xml file. (This is a small part of the entire dump)
To ensure that the file was valid, I verified it by parsing it in Ruby(nokogiri) and Python(ElementTree), and it passed for both.
The errors I received were:
Since, github doesn't allow attaching XMLs, I have copied the xml file (second error) http://pastebin.com/A1puALyw
From what I could notice, the error is directly thrown out of libexpats parse function, so I'm not sure if it can be fixed here.