Open DonFlat opened 2 years ago
Is it possible to kill a node while running? Is possible to limit memory for each node?
Find out how to run pi calculation against Spark and Hadoop.
Benchmark for Pi: how long does it take to reach a decimal digit?
Examples to run hadoop example applications https://blog.csdn.net/carefree2005/article/details/121834803
Hadoop cluster set up https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html https://www.youtube.com/watch?v=_iP2Em-5Abw https://www.linode.com/docs/guides/how-to-install-and-set-up-hadoop-cluster/
Hadoop mapreduce example https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Overview
Hadoop configuration files manual https://hadoop.apache.org/docs/r3.3.4/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
It seems that we only have word count as example for both Spark and Hadoop?
Spark cluster mode: https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types
Two files that need to be modified with latest node name:
modify worker
Default master node: node102, worker: node103
hdfs-site.xml contains replica number
In HiBench, the following workloads have Hadoop version:
ml/Kmeans, ml/bayes, websearch/pagerank, sql/aggregation,join,scan
micro/dfsioe, micro/sleep, micro/sort, micro/terasort, micro/wordcount
The hadoop submission command:
/var/scratch/ddps2206/hadoop-3.3.4/bin/hadoop --config /var/scratch/ddps2206/hadoop-3.3.4/etc/hadoop jar /var/scratch/ddps2206/HiBench/autogen/target/autogen-8.0-SNAPSHOT-jar-with-dependencies.jar org.apache.mahout.clustering.kmeans.GenKMeansDataset -D hadoop.job.history.user.location=hdfs://node108:9000/HiBench/Kmeans/Input/samples -sampleDir hdfs://node108:9000/HiBench/Kmeans/Input/samples -clusterDir hdfs://node108:9000/HiBench/Kmeans/Input/cluster -numClusters 5 -numSamples 30000 -samplesPerFile 6000 -sampleDimension 3
/var/scratch/ddps2206/HiBench/conf/hibench.conf to adjust the size of input 636M small/ 1.4M tiny/ 4.0G large/ 18.5G Huge/ /var/scratch/ddps2206/HiBench/conf/spark.conf to adjust memory cores, default is 4G both
hadoop fs -get /HiBench/Kmeans/Input/samples/*