Intel-bigdata / HiBench

HiBench is a big data benchmark suite.
Other
1.45k stars 761 forks source link

Number of mappers is always 10 per host which is default for wordcount #69

Open brahmareddybattula opened 9 years ago

brahmareddybattula commented 9 years ago

Let me introduce my scenario first..

want to run wordsount for 350GB with 1400 mappers.. Hence i configured NUM_MAPS=1400 and DataSize=350GB in bytes with 256MB block size..

But prepare job is running with 70 maps, As I have 7 nodes in cluster..

this is because, randomtextwriter job by default, it will take 10 maps for host... int numMapsPerHost = conf.getInt("mapreduce.randomtextwriter.mapsperhost", 10);

Currently I did like following and go head.. HADOOP_EXECUTABLE jar $HADOOP_EXAMPLES_JAR randomtextwriter \ $COMPRESS_OPT \ -D mapreduce.randomtextwriter.bytespermap=268435456 -D mapreduce.randomtextwriter.mapsperhost=200 \ $INPUT_HDFS

can we fix same..?

qiansl127 commented 9 years ago

The variable NUM_MAPS is configured for workload running, not preparing.

brahmareddybattula commented 9 years ago

thanks for reply...then how to configure to execute my scenario?

qiansl127 commented 9 years ago

Sorry, my last comment seems to be wrong after I saw the code.

lvsoft commented 9 years ago

@brahmareddybattula Can u paste the printed message like:

Submit MapReduce Job: /home/lv/intel/cluster/hadoop/hadoop-2.5.0-cdh5.3.2/bin/hadoop --config /home/lv/intel/cluster/hadoop/hadoop-2.5.0-cdh5.3.2/etc/hadoop jar /home/lv/intel/cluster/hadoop/hadoop-2.5.0-cdh5.3.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=32000000000 -D mapreduce.job.maps=12 -D mapreduce.job.reduces=6 -D mapreduce.output.fileoutputformat.compress=false hdfs://lv-dev:54310/HiBench/Wordcount/Input

or check your report/wordcount/prepare/conf/wordcount.conf for NUM_MAPS, and report/wordcount/prepare/conf/sparkbench/sparkbench.conf for hibench.default.map.parallelism?

Number of mappers should follow the configurations as you defined and report/wordcount/prepare/conf/wordcount.conf will tell you what and where the value of NUM_MAPS has been defined. Take mine as an example:

# Source: /home/lv/intel/HiBench/conf/99-user_defined_properties.conf
HADOOP_HOME=/home/lv/intel/cluster/hadoop/hadoop-2.5.0-cdh5.3.2
HDFS_MASTER=hdfs://lv-dev:54310
NUM_MAPS=12
NUM_REDS=6
SPARK_HOME=/home/lv/intel/cluster/spark/spark-1.3.0-bin-hadoop2.4
SPARK_MASTER=yarn-client
YARN_EXECUTOR_CORES=4
YARN_NUM_EXECUTORS=4