Hello,
I have some question regarding hivebench that i need clarity for?Your help is greatly appreciated.
What is the specific data that is loaded into hdfs in hivebench after you run ./prepare.sh. Is it the 600,000 html files stated in SIGMOD 09 paper? What is size (How many giga or kilo bytes per html file, or average size of these 600,000 html files?
2.when you run-aggregation.sh does the Time Taken depict the aggregated throughput? If not please help me understand?
rm: Cannot remove directory "hdfs://master:8020/user/hive/warehouse/uservisits_aggre", use -rmr instead
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/home/ubuntu/hive/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/ubuntu/hive_job_log_ubuntu_201502200212_1471946535.txt
OK
Time taken: 4.318 seconds
OK
Time taken: 0.519 seconds
OK
Time taken: 0.023 seconds
OK
Time taken: 0.731 seconds
OK
Time taken: 0.033 seconds
3.Can you explain to me the number after the timestamp that is (103,135...) what is the significance?
2015-02-20 02:12:45,103 Stage-1 map = 0%, reduce = 0%
2015-02-20 02:12:48,135 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec
2015-02-20 02:12:49,140 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec
2015-02-20 02:12:50,149 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec
Appreciate your help.
Thank you
Hello, I have some question regarding hivebench that i need clarity for?Your help is greatly appreciated.
2.when you run-aggregation.sh does the Time Taken depict the aggregated throughput? If not please help me understand? rm: Cannot remove directory "hdfs://master:8020/user/hive/warehouse/uservisits_aggre", use -rmr instead WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/home/ubuntu/hive/lib/hive-common-0.9.0.jar!/hive-log4j.properties Hive history file=/tmp/ubuntu/hive_job_log_ubuntu_201502200212_1471946535.txt OK Time taken: 4.318 seconds OK Time taken: 0.519 seconds OK Time taken: 0.023 seconds OK Time taken: 0.731 seconds OK Time taken: 0.033 seconds 3.Can you explain to me the number after the timestamp that is (103,135...) what is the significance? 2015-02-20 02:12:45,103 Stage-1 map = 0%, reduce = 0% 2015-02-20 02:12:48,135 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec 2015-02-20 02:12:49,140 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec 2015-02-20 02:12:50,149 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec Appreciate your help. Thank you