elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 990 forks source link

elasticsearch Hive issue #336

Closed alphaCoder closed 9 years ago

alphaCoder commented 9 years ago

Hi,

I am using Horton works hadoop 2.1 sandbox. I downloaded the ES-hadoop 2.0.2 jar files and add them to my hive query.

My query looks like this.

CREATE EXTERNAL TABLE Tweet ( user STRING, message STRING ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'twitter/tweet', 'es.index.auto.create' = 'false', 'es-nodes' = 'http://alpha-es' )

I am trying connect to the remote elasticsearch cluster. I even tried replacing the http://alpha-es:9200 or alpha-es nothing works. I am getting the below error.

java.io.IOException: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]

more detailed log:

14/12/04 15:03:11 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO parse.ParseDriver: Parsing command: use default 14/12/04 15:03:11 INFO parse.ParseDriver: Parse Completed 14/12/04 15:03:11 INFO log.PerfLogger: </PERFLOG method=parse start=1417734191820 end=1417734191822 duration=2 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO ql.Driver: Semantic Analysis Completed 14/12/04 15:03:11 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1417734191823 end=1417734191824 duration=1 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:11 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO log.PerfLogger: </PERFLOG method=doAuthorization start=1417734191825 end=1417734191826 duration=1 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:11 INFO log.PerfLogger: </PERFLOG method=compile start=1417734191816 end=1417734191826 duration=10 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:11 INFO log.PerfLogger: 14/12/04 15:03:11 INFO ql.Driver: Starting command: use default 14/12/04 15:03:12 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 14/12/04 15:03:12 INFO hooks.ATSHook: Created ATS Hook 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=PreHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1417734192336 end=1417734192338 duration=2 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1417734191814 end=1417734192338 duration=524 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=runTasks start=1417734192338 end=1417734192413 duration=75 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 14/12/04 15:03:12 INFO hooks.ATSHook: Created ATS Hook 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1417734192961 end=1417734192962 duration=1 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1417734191826 end=1417734192964 duration=1138 from=org.apache.hadoop.hive.ql.Driver> OK 14/12/04 15:03:12 INFO ql.Driver: OK 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1417734192964 end=1417734192964 duration=0 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=Driver.run start=1417734191814 end=1417734192964 duration=1150 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO parse.ParseDriver: Parsing command: SELECT * FROM default.tweet 14/12/04 15:03:12 INFO parse.ParseDriver: Parse Completed 14/12/04 15:03:12 INFO log.PerfLogger: </PERFLOG method=parse start=1417734192968 end=1417734192972 duration=4 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:12 INFO log.PerfLogger: 14/12/04 15:03:12 INFO parse.SemanticAnalyzer: Starting Semantic Analysis 14/12/04 15:03:12 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis 14/12/04 15:03:12 INFO parse.SemanticAnalyzer: Get metadata for source tables 14/12/04 15:03:13 INFO parse.SemanticAnalyzer: Get metadata for subqueries 14/12/04 15:03:13 INFO parse.SemanticAnalyzer: Get metadata for destination tables 14/12/04 15:03:13 INFO ql.Context: New scratch dir is hdfs://sandbox.hortonworks.com:8020/tmp/hive-beeswax-hue/hive_2014-12-04_15-03-12_967_1966756796962932357-1 14/12/04 15:03:13 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis 14/12/04 15:03:13 INFO parse.SemanticAnalyzer: Set stats collection dir : hdfs://sandbox.hortonworks.com:8020/tmp/hive-beeswax-hue/hive_2014-12-04_15-03-12_967_1966756796962932357-1/-ext-10002 14/12/04 15:03:13 INFO ppd.OpProcFactory: Processing for FS(137) 14/12/04 15:03:13 INFO ppd.OpProcFactory: Processing for SEL(136) 14/12/04 15:03:13 INFO ppd.OpProcFactory: Processing for TS(135) 14/12/04 15:03:13 INFO parse.SemanticAnalyzer: Completed plan generation 14/12/04 15:03:13 INFO ql.Driver: Semantic Analysis Completed 14/12/04 15:03:13 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1417734192973 end=1417734193164 duration=191 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:13 INFO exec.TableScanOperator: Initializing Self 135 TS 14/12/04 15:03:13 INFO exec.TableScanOperator: Operator 135 TS initialized 14/12/04 15:03:13 INFO exec.TableScanOperator: Initializing children of 135 TS 14/12/04 15:03:13 INFO exec.SelectOperator: Initializing child 136 SEL 14/12/04 15:03:13 INFO exec.SelectOperator: Initializing Self 136 SEL 14/12/04 15:03:13 INFO exec.SelectOperator: SELECT structuser:string,message:string 14/12/04 15:03:13 INFO exec.SelectOperator: Operator 136 SEL initialized 14/12/04 15:03:13 INFO exec.SelectOperator: Initializing children of 136 SEL 14/12/04 15:03:13 INFO exec.ListSinkOperator: Initializing child 138 OP 14/12/04 15:03:13 INFO exec.ListSinkOperator: Initializing Self 138 OP 14/12/04 15:03:13 INFO exec.ListSinkOperator: Operator 138 OP initialized 14/12/04 15:03:13 INFO exec.ListSinkOperator: Initialization Done 138 OP 14/12/04 15:03:13 INFO exec.SelectOperator: Initialization Done 136 SEL 14/12/04 15:03:13 INFO exec.TableScanOperator: Initialization Done 135 TS 14/12/04 15:03:13 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:default.tweet.user, type:string, comment:null), FieldSchema(name:default.tweet.message, type:string, comment:null)], properties:null) 14/12/04 15:03:13 INFO log.PerfLogger: 14/12/04 15:03:13 INFO log.PerfLogger: </PERFLOG method=doAuthorization start=1417734193169 end=1417734193481 duration=312 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:13 INFO log.PerfLogger: </PERFLOG method=compile start=1417734192965 end=1417734193481 duration=516 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:13 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 14/12/04 15:03:13 INFO hive.metastore: Connected to metastore. 14/12/04 15:03:13 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/12/04 15:03:13 INFO log.PerfLogger: 14/12/04 15:03:13 INFO ql.Driver: Starting command: SELECT * FROM default.tweet 14/12/04 15:03:14 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 14/12/04 15:03:14 INFO hooks.ATSHook: Created ATS Hook 14/12/04 15:03:14 INFO log.PerfLogger: 14/12/04 15:03:14 INFO log.PerfLogger: </PERFLOG method=PreHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1417734194312 end=1417734194314 duration=2 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:14 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1417734191814 end=1417734194316 duration=2502 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:14 INFO log.PerfLogger: 14/12/04 15:03:14 INFO log.PerfLogger: </PERFLOG method=runTasks start=1417734194316 end=1417734194316 duration=0 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:14 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 14/12/04 15:03:14 INFO hooks.ATSHook: Created ATS Hook 14/12/04 15:03:14 INFO log.PerfLogger: 14/12/04 15:03:14 INFO log.PerfLogger: </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1417734194628 end=1417734194628 duration=0 from=org.apache.hadoop.hive.ql.Driver> 14/12/04 15:03:14 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1417734193528 end=1417734194629 duration=1101 from=org.apache.hadoop.hive.ql.Driver> OK 14/12/04 15:03:14 INFO ql.Driver: OK 14/12/04 15:03:16 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/12/04 15:03:16 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 14/12/04 15:03:16 INFO httpclient.HttpMethodDirector: Retrying request 14/12/04 15:03:16 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 14/12/04 15:03:16 INFO httpclient.HttpMethodDirector: Retrying request 14/12/04 15:03:16 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 14/12/04 15:03:16 INFO httpclient.HttpMethodDirector: Retrying request 14/12/04 15:03:16 ERROR rest.NetworkClient: Node [Connection refused] failed (localhost:9200); no other nodes left - aborting...

Appreciate any help.

costin commented 9 years ago

You are not setting the node of the Elasticsearch cluster correctly - should be es.nodes not es-nodes. Hence why the task executes against the default location (localhost) where Elasticsearch is not running.

alphaCoder commented 9 years ago

Thank you Costin. My bad I missed that.