dnmilne / wikipediaminer

An open source toolkit for mining Wikipedia
130 stars 62 forks source link

Extraction of page summaries #4

Closed dnmilne closed 10 years ago

dnmilne commented 10 years ago

Can deal with all page links, category links, redirects etc without big memory requirements by treating it as a graph resolution problem. In each map, deal with one node in the graph (a page), emit that node again plus any information that needs to be communicated to adjacent nodes. Then in each reduce, collapse all of the partial information about each node into a complete picture of that node.