Hearen / HadoopInitializer

Using pure shell to configure hadoop 2.7.1 environment in CentOS 7.1 cluster
GNU General Public License v3.0
1 stars 0 forks source link

utilize starfish to optimize the hadoop configuration #25

Closed Hearen closed 7 years ago

Hearen commented 7 years ago

the steps and the examples should be provided detailedly

Hearen commented 7 years ago

Of course, before everything else, you have to first install hadoop 0.20.2 properly.

  1. Download starfish from: http://www.cs.duke.edu/starfish/files/starfish-0.3.0.tar.gz

  2. tar -zxvf starfish-0.3.0.tar.gz

  3. go to starfish-0.3.0/bin and edit config.sh according to https://github.com/Hearen/Starfish/blob/master/starfish/docs/profile.readme and here I set the three values as follows

    SLAVES_BTRACE_DIR=/home/hadoop/starfish-0.3.0/starfish_test/btrace_dir CLUSTER_NAME=starfish_test PROFILER_OUTPUT_DIR=/home/hadoop/starfish-0.3.0/starfish_test/profile_output_dir

  4. install btrace for all the machines by ./install_btrace.sh slaves. slaves here refers to a file containing all the hosts of the cluster you planned to monitor. An example here can be

    133.133.135.34 133.133.135.37 133.133.131.18

  5. in hadoop 0.20.2 all the benchmarks we could use are originated from these two built-in packages:

    • hadoop jar hadoop-0.20.2-examples.jar

      aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. dbcount: An example job that count the pageview counts from a database. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using monte-carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sleep: A job that sleeps at each map and reduce task. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files.

  1. try profile now ./profile hadoop jar /home/hadoop/hadoop/hadoop-0.20.2-examples.jar pi 16 10000
  2. once the job profiled, we could utilize optimize to optimize the job for more details: https://github.com/Hearen/Starfish/blob/master/starfish/docs/optimize.readme
Hearen commented 7 years ago

To utilize its Visualizer to take advantage of its GUI, we have to install a GUI Desktop first. Here I will try to install yum -y groups install "GNOME Desktop"" and start the new desktop environment by startx and then to enable it permanently, we have to execute systemctl set-default graphical.traget to avoid startx each time trying to use desktop.

Hearen commented 7 years ago

As for the hadoop 0.20.2 installation references:

  1. http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
  2. https://venuktan.wordpress.com/2012/11/14/setting-up-hadoop-0-20-2-single-node-cluster-on-ubuntu/
Hearen commented 7 years ago

hadoop dfsadmin -safemode leave

Hearen commented 7 years ago

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

Hearen commented 7 years ago

http://yut.hatenablog.com/entry/20120510/1336606109

Hearen commented 7 years ago

PiEst hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar pi 10 200

wget http://www.gutenberg.org/files/4300/4300-0.txt Wordcount hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar wordcount /user/hadoop-test/input /user/hadoop-test/wordcount/output

Teragen hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar teragen 10000 /user/hadoop-test/tera/input

terasort hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar terasort /user/hadoop-test/tera/input /user/hadoop-test/tera/output

teravalidate hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar teravalidate /user/hadoop-test/tera/output /user/hadoop-test/tera/validate

grep hadoop jar /home/hadoop-test/hadoop/hadoop-0.20.2-examples.jar grep /user/hadoop-test/input /user/hadoop-test/grep/output 'ab?'

Hearen commented 7 years ago

stop-all.sh hadoop namenode -format start-all.sh