Even though the bayes workload failed to complete still its logged as success(means duration etc..) in hibench.report

nareshgundla commented 7 years ago

hibench.report: HadoopBayes 2017-03-21 10:42:20 375779530 482.549 778738 259579

bayes bench.log MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /ngundla/hadoop-2.7.3/bin/hadoop and HADOOP_CONF_DIR=/ngundla/hadoop-2.7.3/etc/hadoop MAHOUT-JOB: /ngundla/HiBench/hadoopbench/mahout/target/apache-mahout-distribution-0.11.0/mahout-examples-0.11.0-job.jar 17/03/21 10:41:22 WARN MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only 17/03/21 10:41:22 INFO AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --input=[hdfs://c4n52:8020/HiBench/Bayes/Output/vectors/tfidf-vectors], --labelIndex=[hdfs://c4n52:8020/HiBench/Bayes/Output/labelindex], --output=[hdfs://c4n52:8020/HiBench/Bayes/Output/model], --overwrite=null, --startPhase=[0], --tempDir=[hdfs://c4n52:8020/HiBench/Bayes/Output/temp]} 17/03/21 10:41:25 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 17/03/21 10:41:25 INFO deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress 17/03/21 10:41:25 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 17/03/21 10:41:25 INFO RMProxy: Connecting to ResourceManager at c4n52/192.168.1.112:8032 17/03/21 10:41:26 INFO FileInputFormat: Total input paths to process : 8 17/03/21 10:41:26 INFO JobSubmitter: number of splits:8 17/03/21 10:41:26 INFO JobSubmitter: Submitting tokens for job: job_1490040420327_0132 17/03/21 10:41:26 INFO YarnClientImpl: Submitted application application_1490040420327_0132 17/03/21 10:41:26 INFO Job: The url to track the job: http://c4n52:8088/proxy/application_1490040420327_0132/ 17/03/21 10:41:26 INFO Job: Running job: job_1490040420327_0132 17/03/21 10:41:32 INFO Job: Job job_1490040420327_0132 running in uber mode : false 17/03/21 10:41:32 INFO Job: map 0% reduce 0% 17/03/21 10:41:39 INFO Job: map 50% reduce 0% 17/03/21 10:41:40 INFO Job: map 88% reduce 0% 17/03/21 10:41:41 INFO Job: map 100% reduce 0% 17/03/21 10:41:49 INFO Job: map 100% reduce 71% 17/03/21 10:41:52 INFO Job: map 100% reduce 100% 17/03/21 10:41:53 INFO Job: Job job_1490040420327_0132 completed successfully 17/03/21 10:41:53 INFO Job: Counters: 50 File System Counters FILE: Number of bytes read=58192899 FILE: Number of bytes written=117453020 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=134502042 HDFS: Number of bytes written=134499965 HDFS: Number of read operations=35 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Killed map tasks=1 Launched map tasks=8 Launched reduce tasks=1 Data-local map tasks=8 Total time spent by all maps in occupied slots (ms)=140384 Total time spent by all reduces in occupied slots (ms)=44712 Total time spent by all map tasks (ms)=35096 Total time spent by all reduce tasks (ms)=11178 Total vcore-milliseconds taken by all map tasks=35096 Total vcore-milliseconds taken by all reduce tasks=11178 Total megabyte-milliseconds taken by all map tasks=143753216 Total megabyte-milliseconds taken by all reduce tasks=45785088 Map-Reduce Framework Map input records=100 Map output records=100 Map output bytes=134497088 Map output materialized bytes=58185965 Input split bytes=1104 Combine input records=100 Combine output records=100 Reduce input groups=100 Reduce shuffle bytes=58185965 Reduce input records=100 Reduce output records=100 Spilled Records=200 Shuffled Maps =8 Failed Shuffles=0 Merged Map outputs=8 GC time elapsed (ms)=1786 CPU time spent (ms)=40680 Physical memory (bytes) snapshot=4622409728 Virtual memory (bytes) snapshot=49508855808 Total committed heap usage (bytes)=30483152896 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=134500938 File Output Format Counters Bytes Written=134499965 17/03/21 10:41:53 INFO RMProxy: Connecting to ResourceManager at c4n52/192.168.1.112:8032 17/03/21 10:41:53 INFO FileInputFormat: Total input paths to process : 1 17/03/21 10:41:53 INFO JobSubmitter: number of splits:1 17/03/21 10:41:53 INFO JobSubmitter: Submitting tokens for job: job_1490040420327_0133 17/03/21 10:41:53 INFO YarnClientImpl: Submitted application application_1490040420327_0133 17/03/21 10:41:53 INFO Job: The url to track the job: http://c4n52:8088/proxy/application_1490040420327_0133/ 17/03/21 10:41:53 INFO Job: Running job: job_1490040420327_0133 17/03/21 10:41:58 INFO Job: Job job_1490040420327_0133 running in uber mode : false 17/03/21 10:41:58 INFO Job: map 0% reduce 0% 17/03/21 10:42:09 INFO Job: map 100% reduce 0% 17/03/21 10:42:16 INFO Job: map 100% reduce 100% 17/03/21 10:42:16 INFO Job: Job job_1490040420327_0133 completed successfully 17/03/21 10:42:17 INFO Job: Counters: 49 File System Counters FILE: Number of bytes read=6897000 FILE: Number of bytes written=14036671 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=134500105 HDFS: Number of bytes written=16453801 HDFS: Number of read operations=7 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=34596 Total time spent by all reduces in occupied slots (ms)=18576 Total time spent by all map tasks (ms)=8649 Total time spent by all reduce tasks (ms)=4644 Total vcore-milliseconds taken by all map tasks=8649 Total vcore-milliseconds taken by all reduce tasks=4644 Total megabyte-milliseconds taken by all map tasks=35426304 Total megabyte-milliseconds taken by all reduce tasks=19021824 Map-Reduce Framework Map input records=100 Map output records=2 Map output bytes=16453675 Map output materialized bytes=6896992 Input split bytes=140 Combine input records=2 Combine output records=2 Reduce input groups=2 Reduce shuffle bytes=6896992 Reduce input records=2 Reduce output records=2 Spilled Records=4 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=468 CPU time spent (ms)=12770 Physical memory (bytes) snapshot=1161129984 Virtual memory (bytes) snapshot=11007291392 Total committed heap usage (bytes)=6774849536 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=134499965 File Output Format Counters Bytes Written=16453801 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:489) at org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:442) at org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:128) at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:125) at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:89) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2256) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2384) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:101) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators$5.hasNext(Iterators.java:543) at com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:43) at org.apache.mahout.classifier.naivebayes.BayesUtils.readModelFromDir(BayesUtils.java:78) at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:160) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.main(TrainNaiveBayesJob.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

VinceShieh commented 7 years ago

we tested on our end, no such issue found. Please check your env setting, seems like a memory configuration issue, as it reported OOM in the log

nareshgundla commented 7 years ago

Yes this was related to memory issue, it fixed after increasing the HADOOP_CLIENT_OPTS memory.

if there is an error in the workload, it should not log the workload metric in the hibench.report right?

I have one more question How many map-reduce jobs are launched by this workload ? (is it 2 jobs or depends on the data set size)

Were you able to successfully complete the Hadoop bayes for scale factor bigdata? How much time does it take to complete the Hadoop bayes workload for scale factor bigdata ?

Intel-bigdata / HiBench

Even though the bayes workload failed to complete still its logged as success(means duration etc..) in hibench.report #439