Closed Hearen closed 7 years ago
Of course, before everything else, you have to first install hadoop 0.20.2 properly.
Download starfish from: http://www.cs.duke.edu/starfish/files/starfish-0.3.0.tar.gz
tar -zxvf starfish-0.3.0.tar.gz
go to starfish-0.3.0/bin
and edit config.sh according to https://github.com/Hearen/Starfish/blob/master/starfish/docs/profile.readme and here I set the three values as follows
SLAVES_BTRACE_DIR=/home/hadoop/starfish-0.3.0/starfish_test/btrace_dir CLUSTER_NAME=starfish_test PROFILER_OUTPUT_DIR=/home/hadoop/starfish-0.3.0/starfish_test/profile_output_dir
install btrace for all the machines by ./install_btrace.sh slaves
. slaves here refers to a file containing all the hosts of the cluster you planned to monitor. An example here can be
133.133.135.34 133.133.135.37 133.133.131.18
in hadoop 0.20.2 all the benchmarks we could use are originated from these two built-in packages:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. dbcount: An example job that count the pageview counts from a database. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using monte-carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sleep: A job that sleeps at each map and reduce task. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files.
DFSCIOTest: Distributed i/o benchmark of libhdfs. DistributedFSCheck: Distributed checkup of the file system consistency. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures TestDFSIO: Distributed i/o benchmark. dfsthroughput: measure hdfs throughput filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode. testarrayfile: A test for flat files of binary key/value pairs. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A test for FileSystem read/write. testipc: A test for ipc. testmapredsort: A map/reduce program that validates the map-reduce framework's sort. testrpc: A test for rpc. testsequencefile: A test for flat files of binary key value pairs. testsequencefileinputformat: A test for sequence file input format. testsetfile: A test for flat files of binary key/value pairs. testtextinputformat: A test for text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
./profile hadoop jar /home/hadoop/hadoop/hadoop-0.20.2-examples.jar pi 16 10000
To utilize its Visualizer to take advantage of its GUI, we have to install a GUI Desktop first. Here I will try to install yum -y groups install "GNOME Desktop""
and start the new desktop environment by startx
and then to enable it permanently, we have to execute systemctl set-default graphical.traget
to avoid startx
each time trying to use desktop.
hadoop dfsadmin -safemode leave
PiEst hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar pi 10 200
wget http://www.gutenberg.org/files/4300/4300-0.txt Wordcount hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar wordcount /user/hadoop-test/input /user/hadoop-test/wordcount/output
Teragen hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar teragen 10000 /user/hadoop-test/tera/input
terasort hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar terasort /user/hadoop-test/tera/input /user/hadoop-test/tera/output
teravalidate hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar teravalidate /user/hadoop-test/tera/output /user/hadoop-test/tera/validate
grep hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar grep /user/hadoop-test/input /user/hadoop-test/grep/output 'ab?'
stop-all.sh hadoop namenode -format start-all.sh
the steps and the examples should be provided detailedly