Initially posted here idio/json-wikipedia#43, but should have started here as it is the main repo :)
I'm trying to parse single wikipedia xml file. Like the mercedes.xml in the test of this repo. Following the code in the test section I tried something like:
import it.cnr.isti.hpc.wikipedia.article.Article
import it.cnr.isti.hpc.wikipedia.parser.ArticleParser
val parser = new ArticleParser("en")
val testXml = IOUtils.getFileAsUTF8String("./mercedes.xml")
val testArticle = new Article()
parser.parse(testArticle, testXml)
But the result is strange. Many properties are blank, like title, wikiTitle, ... and paragraphs / clean text are also wrongly parsed. I guess I'm doing something wrong ^^ If you could show some usage of the API to process a single article in xml it would be very great :)
Hi,
Initially posted here idio/json-wikipedia#43, but should have started here as it is the main repo :)
I'm trying to parse single wikipedia xml file. Like the mercedes.xml in the test of this repo. Following the code in the test section I tried something like:
But the result is strange. Many properties are blank, like title, wikiTitle, ... and paragraphs / clean text are also wrongly parsed. I guess I'm doing something wrong ^^ If you could show some usage of the API to process a single article in xml it would be very great :)
Thanks, Thomas