danielepantaleone / hadoop-pagerank

PageRank algorithm implementation which make use of the Apache Hadoop framework

22 stars 26 forks source link

algorithm hadoop java pagerank

readme

Hadoop PageRank

PageRank algorithm implementation which make use of the Apache Hadoop framework.

Execute the program

Install Hadoop on your machine [OSX], [Linux]
Pick a dataset from the Stanford web graphs collection
Place the dataset in your Hadoop FS
Create the directory which will contain the output
Build a JAR using this source code and name it pagerank.jar
Launch the software using Hadoop: hadoop jar pagerank.jar --input <in> --output <out>
Browse the PageRank output result which can be found in the Hadoop FS

Usage reference

--help (-h): display the help text
--damping (-d) : the damping factor [OPTIONAL] [DEFAULT = 0.85]
--count (-c) : the amount of iterations [OPTIONAL] [DEFAULT = 2]
--input (-i) : the directory of the input graph [REQUIRED]
--output (-o) : the directory of the output result [REQUIRED]