Intel-bigdata / HiBench

HiBench is a big data benchmark suite.
Other
1.45k stars 761 forks source link

HiBench hivebench external table field delimiter ',' problem #3

Open jerryshao opened 12 years ago

jerryshao commented 12 years ago

hivebench external table is delimited by comma ',' , but in hivebench table field 'useragent' filed, there exists comma in the raw data, like "Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/xxx", so hive will delimit field by mistake, original filed useragent's data will wrongly delimit to the next field, and all the next left fields' data is wrong. like: 48.230.80.233 wyfctppjxtyhbcbngouswjzsekwdzqiaaapmomt 1976-04-18 0.44219965 Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13 (KHTML like Gecko) Chrome/xxx LBY LBY-AR NULL

jintaoguan commented 10 years ago

I am using CDH4.4. I encountered a problem in the Hive join test.

This is the log.

ava.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"sourceip":"182.163.112.4","desturl":"nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi","visitdate":"1978-10-17","adrevenue":0.332717,"useragent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)","countrycode":"BGR","languagecode":"BGR-BG","searchword":"rtzspywqgfplrlt","duration":10} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)

This is the first MapReduce job of Hive join. The input data is generated by Hibench. I don't know if it is the same problem as what @jerryshao described.

Could someone tell me what to do with this error ?

jintaoguan commented 10 years ago

The full Error trace is as below, it said "Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable"

2014-09-12 11:43:45,550 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"sourceip":"182.163.112.4","desturl":"nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi","visitdate":"1978-10-17","adrevenue":0.332717,"useragent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)","countrycode":"BGR","languagecode":"BGR-BG","searchword":"rtzspywqgfplrlt","duration":10} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.get(LazyIntObjectInspector.java:38) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:317) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:255) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:202) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:236) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more

jintaoguan commented 10 years ago

Sorry I have found the cause of this error. Since CDH4,4 uses hive-0.10.0, there is some version problem. This error shows up when I am using the hive from HiBench. After using the hive within CDH4.4, this error is gone.