Author: @benjamincoe
For some NLP research I'm currently doing, I was interested in parsing structured information from Wikipedia articles.
I did not want to use a full-featured MediaWiki parser:
WikiFetch Crawls a Wikipedia article using Node.js and jQuery. It returns a structured JSON-representation of the page:
{
"title": "Foobar Article",
"links": {
"Link_to_another_article: {
"text": "Another article.", // the text that was linked.
"title": "Another_article.", // title attribute <a/> tag.
"occurrences": 1 // number of times this article was linked.
}
},
"sections": {
"Section Heading": {
text: "text contents of section.",
images: ["http://foobar.jpg"] // images occurring within this section.
}
}
}
npm install wikifetch -g
wikifetch --article=Dog