Closed gaurav closed 10 years ago
https://commons.wikimedia.org/wiki/Help:Gadget-ImageAnnotator Example image: https://commons.wikimedia.org/wiki/File:Spelterini_Bl%C3%BCemlisalp.jpg
We can use a PageNodeExtractor to find the appropriate templates, and then we take the nodes in between and convert it to text.
We can have both a StringParser (https://github.com/dbpedia/extraction-framework/blob/ce8339360355c9b3fe7c8f803e38ebb016fcd79b/core/src/main/scala/org/dbpedia/extraction/dataparser/StringParser.scala) representation as well as a raw WikiText representation, which could be run through Commons' MediaWiki API if somebody needs that translated into HTML.
We should use the PageNode: a WikiPage can be used, but the regex would get very complicated to deal with spacing and stuff. The PageNode should be pretty straightforward.
https://commons.wikimedia.org/wiki/Help:Gadget-ImageAnnotator Example image: https://commons.wikimedia.org/wiki/File:Spelterini_Bl%C3%BCemlisalp.jpg
We can use a PageNodeExtractor to find the appropriate templates, and then we take the nodes in between and convert it to text.
We can have both a StringParser (https://github.com/dbpedia/extraction-framework/blob/ce8339360355c9b3fe7c8f803e38ebb016fcd79b/core/src/main/scala/org/dbpedia/extraction/dataparser/StringParser.scala) representation as well as a raw WikiText representation, which could be run through Commons' MediaWiki API if somebody needs that translated into HTML.