Closed xiaoyi78 closed 9 years ago
edited your post to add minor formatting - please do so yourself in the future
@xiaoyi78 It looks like there's a problem with your Hadoop install.
The NPE stacktrace is triggered by the Hadoop classes in particular ReflectionUtils.setJobConf
which tries to do some reflection:
//If JobConf and JobConfigurable are in classpath, AND
//theObject is of type JobConfigurable AND
//conf is of type JobConf then
//invoke configure on theObject
This happens way before the connector kicks in and suggests that you might be having an incomplete Hadoop/Hive configuration or classpath or potentially you are using a mixture of Hadoop 1 and 2. Try using a different Hadoop install just for testing or upgrade your CDH install to the latest stable branch (even on 4.5.x).
Thanks for your quick response and pointing out this is not related to the connector. We will investigate further on our configuration. Cheers.
Sorry Costin, the error message I provided to you in my first post is from 'syslog' tab, but when I looked into 'stderr' tab, it has the error message related to elastic search. Please see below. Sorry, I need to learn how to format the code block below. Please bear with me one more time. It seems to me I got some classpath issue. But I still don't understand what went wrong for my classpath, I can create external table pointing to ES and read data from it. It would fail if my classpath was not set up properly. Appreciate if you could shed some light...
The code block below :
Continuing ... java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveOutputFormat Continuing ... java.lang.NullPointerException Continuing ... java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:315) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392) at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:377) at org.apache.hadoop.hive.ql.exec.LimitOperator.initializeOp(LimitOperator.java:41) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)
Sorry, just reformat my log block for my previous post.
Continuing ...
java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:315)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:377)
at org.apache.hadoop.hive.ql.exec.LimitOperator.initializeOp(LimitOperator.java:41)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
@xiaoyi78 the es-hadoop jar is not found in the classpath hence the CNFE. Hive is notorious for being hard to setup the classpath; it's best to double check their documentation (here is one link).
One reliable configuration for the latest versions of Hive is setting the HIVE_AUX_JARS_PATH
as apparently setting the hive.xml
is ignored by HiveServer2.
Note that this is not a es-hadoop bug - its classes are simply not found and thus cannot be loaded by Hive.
@xiaoyi78 By the way, make sure you only use just one version of es-hadoop. If you are using multiple (1.3 and 2.0 and 2.1), the classes are likely to trip on each other as the runtime will pick classes from each jar and these are not compatible.
Thanks Cosin. Finally, I figured out the issue. As you mentioned in another post, most of this type issue was caused by classpath not set up correctly. In my case, I had to add the es-hadoop jar file into another location -- "CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hadoop-0.20-mapreduce/lib", which is for map-reduce job. This makes sense because the exception was thrown during reduce task and it could not find es-hadoop jar file.
So for people who experience the similar issue, please note apart from setting up the correct Hive classpath, but also for MapReduce job classpath.
Thanks again for the support.
Would appreciate if you could look into this issue.
I'm trying to insert data from hive to ES using elasticsearch-hadoop-2.1.0.Beta4.jar, but it's not successful. I was using elasticsearch-hadoop-2.0.2.jar version, it didn't work. That's why I downloaded elasticsearch-hadoop-2.1.0.Beta4.jar to try. I can create external tables and read data from ES without any issues. So I think the jar is in the right directory(In my case, I put the jar file in /hive/lib folder).
My environment is :
Hadoop 2.0.0-cdh4.5.0 hive-common-0.10.0
My script is :
Error message from the failed reduce task log:
Please let me know if you need more information. Thanks.