Intel-bigdata / HiBench

HiBench is a big data benchmark suite.
Other
1.46k stars 767 forks source link

HiveBench Data Loaded in to HDFS #87

Open KofDossou opened 9 years ago

KofDossou commented 9 years ago

Hello, I have some question regarding hivebench that i need clarity for?Your help is greatly appreciated.

  1. What is the specific data that is loaded into hdfs in hivebench after you run ./prepare.sh. Is it the 600,000 html files stated in SIGMOD 09 paper? What is size (How many giga or kilo bytes per html file, or average size of these 600,000 html files?

2.when you run-aggregation.sh does the Time Taken depict the aggregated throughput? If not please help me understand? rm: Cannot remove directory "hdfs://master:8020/user/hive/warehouse/uservisits_aggre", use -rmr instead WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/home/ubuntu/hive/lib/hive-common-0.9.0.jar!/hive-log4j.properties Hive history file=/tmp/ubuntu/hive_job_log_ubuntu_201502200212_1471946535.txt OK Time taken: 4.318 seconds OK Time taken: 0.519 seconds OK Time taken: 0.023 seconds OK Time taken: 0.731 seconds OK Time taken: 0.033 seconds 3.Can you explain to me the number after the timestamp that is (103,135...) what is the significance? 2015-02-20 02:12:45,103 Stage-1 map = 0%, reduce = 0% 2015-02-20 02:12:48,135 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec 2015-02-20 02:12:49,140 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec 2015-02-20 02:12:50,149 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec Appreciate your help. Thank you

adrian-wang commented 9 years ago

@AllanY