bcoe / wikifetch

Uses jQuery to return a structured JSON representation of a Wikipedia article.
127 stars 12 forks source link

WikiFetch

Author: @benjamincoe

Problem

For some NLP research I'm currently doing, I was interested in parsing structured information from Wikipedia articles.

I did not want to use a full-featured MediaWiki parser:

The Solution

WikiFetch Crawls a Wikipedia article using Node.js and jQuery. It returns a structured JSON-representation of the page:

    {
        "title": "Foobar Article",
        "links": {
            "Link_to_another_article: {
                "text": "Another article.", // the text that was linked.
                "title": "Another_article.", // title attribute <a/> tag.
                "occurrences": 1 // number of times this article was linked.
            }
        },
        "sections": {
            "Section Heading": {
                text: "text contents of section.",
                images: ["http://foobar.jpg"] // images occurring within this section.
            }
        }
    }

Usage

npm install wikifetch -g
wikifetch --article=Dog