elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 990 forks source link

Issue while joining two tables stored on Elasticsearch using HiveQL #266

Closed code-rider closed 10 years ago

code-rider commented 10 years ago

I tried to join two tables but it doesnt work. this issue already raised but that is closed and i think changes are not merge in current release. my issue is same as #180 but error is not same. i am using hive-0.13.1 i have tried latest snapshots from https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/ also used jar downloaded from http://www.elasticsearch.org/overview/hadoop/download/ 2.0.1 2.1.0.Beta1 but not working. get different errors with different snapshots. with Stable Release 2.0.1 and Beta Release 2.1.0.Beta1 i am getting same error when i send join query to elasticsearch tables. select query is working fine index in ELS is also working. join query error is .

java.io.IOException: Cannot run program "$HADOOP_HOME/bin/hadoop" (in directory "/usr/local/hive-src/packaging/target/apache-hive-0.13.1-bin/apache-hive-0.13.1-bin"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at java.lang.Runtime.exec(Runtime.java:617) at java.lang.Runtime.exec(Runtime.java:450) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:258) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 21 more FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

help me out ! Thanks

costin commented 10 years ago

By the looks of it you have some basic Hive issues - have you manage to run any type of Hive query?

java.io.IOException: Cannot run program "$HADOOP_HOME/bin/hadoop" (in directory "/usr/local/hive-src/packaging/target/apache-hive-0.13.1-bin/apache-hive-0.13.1-bin"): error=2, No such file or directory

Basically your hive install cannot find Hadoop - try reading the Hive docs or the Hive mailing list. Or give one of the Hive VMs a spin...

code-rider commented 10 years ago

all other queries are working fine. this error is show on only in join query. $HADOOP_HOME = /usr/local/hive-src/packaging/target/apache-hive-0.13.1-bin/apache-hive-0.13.1-bin"

in the error path is correct for hadoop.

costin commented 10 years ago

The error is not caused by es-hadoop - hence why it doesn't appear in the stracktrace. It's a configuration issue and you can most likely reproduce it with other JOINs outside Elasticsearch and Elasticsearch Hadoop.

I'm sorry but there's nothing we can do since this is outside the scope of es-hadoop.

code-rider commented 10 years ago

i know error is not looking about es-hadoop. but join query for hive tables is working fine. example query: select t.user_id ,t.text from hive_table1 t join ( select user_id from hive_table2) s on t.user_id = s.user_id limit 50;

result: Total jobs = 1 Stage-1 is selected by condition resolver. Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1410428427800_0001, Tracking URL = http://localhost:8088/proxy/application_1410428427800_0001/ Kill Command = $HADOOP_HOME/bin/hadoop job -kill job_1410428427800_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-09-11 14:46:54,923 Stage-1 map = 0%, reduce = 0% 2014-09-11 14:47:11,735 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 9.61 sec 2014-09-11 14:47:15,263 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 12.72 sec 2014-09-11 14:47:17,630 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 15.84 sec 2014-09-11 14:47:21,231 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 18.94 sec 2014-09-11 14:47:27,477 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 24.95 sec 2014-09-11 14:47:30,945 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 28.52 sec 2014-09-11 14:47:46,103 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 32.65 sec MapReduce Total cumulative CPU time: 32 seconds 650 msec Ended Job = job_1410428427800_0001 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 32.65 sec HDFS Read: 59891953 HDFS Write: 4717 SUCCESS Total MapReduce CPU Time Spent: 32 seconds 650 msec OK t1.user_id t1.text -2147382294 #nina @kenzzbenzzz http://t.co/Q01JtfKGHd -2147360056 Vou começar a beber. Apareço por aqui se houver tiros e tbm pra comentar o jogo. ⥠-2147355590 Let the 2014 FIFA World Cup begin! -2147353199 RT @Gelo_23: Ð¡Ð°Ð¹Ñ Ð²Ñе Ð´Ð»Ñ Ð°Ð½Ð´Ñоида http://t.co/DCJYv23mCX -2147284816 @bekkiasquith that ain't Portuguese tho is it -2147246313 https://t.co/yxsKlH1zom NULL NULL NULL NULL -2147231254 RT @2014WC_Brazil: This is the state of the pitch in Manaus. England will face Italy here on Saturday. http://t.co/c9qlljvauK -2147132982 RT @thereal2keyz: â@bitchinchargex: NO CHILL nigga watching porn in class ð©ð©ðððð±ð±ð±ð±ðððð± https://t.co/e0mXI87Gwfâ Time taken: 81.974 seconds, Fetched: 50 row(s) hive> past a portion of result, query example to elasticsearch table: select t.user_id ,t.user_screen_name from users t join ( select user_screen_name from user_tweets) s on t.user_screen_name = s.user_screen_name limit 5; users = elasticsearch table. user_tweets = elasticsearch table. result hive> select t.user_id ,t.user_screen_name from users t join ( select user_screen_name from user_tweets) s on t.user_screen_name = s.user_screen_name limit 5; Total jobs = 1 java.io.IOException: Cannot run program "$HADOOPHOME/bin/hadoop" (in directory "/usr/local/hive-src/packaging/target/apache-hive-0.13.1-bin/apache-hive-0.13.1-bin"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at java.lang.Runtime.exec(Runtime.java:617) at java.lang.Runtime.exec(Runtime.java:450) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:258) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 21 more FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask its mean this is caused by es-hadoop. other query to elastichsearch table example: users = elasticsearch table. query : select count() from users; hive> select count(_) from users; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1410428427800_0002, Tracking URL = http://localhost:8088/proxy/application_1410428427800_0002/ Kill Command = $HADOOP_HOME/bin/hadoop job -kill job_1410428427800_0002 Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1 2014-09-11 15:20:13,128 Stage-1 map = 0%, reduce = 0% 2014-09-11 15:20:56,697 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 19.28 sec 2014-09-11 15:20:58,212 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 21.99 sec 2014-09-11 15:20:59,686 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 22.92 sec 2014-09-11 15:21:01,212 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 26.98 sec 2014-09-11 15:21:02,814 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 27.61 sec 2014-09-11 15:21:04,315 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 31.84 sec 2014-09-11 15:21:07,528 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 34.53 sec 2014-09-11 15:21:10,480 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 35.9 sec 2014-09-11 15:21:11,975 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 37.34 sec 2014-09-11 15:21:13,287 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 38.01 sec 2014-09-11 15:21:14,787 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 39.96 sec 2014-09-11 15:21:16,247 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 40.62 sec 2014-09-11 15:21:17,647 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 41.21 sec 2014-09-11 15:21:19,121 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 42.6 sec 2014-09-11 15:21:20,735 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 43.95 sec 2014-09-11 15:21:22,080 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 45.27 sec 2014-09-11 15:21:23,530 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 45.85 sec 2014-09-11 15:21:25,156 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 47.91 sec 2014-09-11 15:21:26,592 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 48.49 sec 2014-09-11 15:21:27,981 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 49.09 sec 2014-09-11 15:21:29,428 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 50.38 sec 2014-09-11 15:21:30,873 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 51.7 sec 2014-09-11 15:21:32,130 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 52.99 sec 2014-09-11 15:21:33,590 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 53.63 sec 2014-09-11 15:21:35,030 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 54.3 sec 2014-09-11 15:21:37,250 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 56.86 sec 2014-09-11 15:21:38,464 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 57.53 sec 2014-09-11 15:21:39,668 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 59.23 sec 2014-09-11 15:21:55,516 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 62.01 sec MapReduce Total cumulative CPU time: 1 minutes 2 seconds 10 msec Ended Job = job_1410428427800_0002 MapReduce Jobs Launched: Job 0: Map: 4 Reduce: 1 Cumulative CPU: 62.01 sec HDFS Read: 203068 HDFS Write: 6 SUCCESS Total MapReduce CPU Time Spent: 1 minutes 2 seconds 10 msec OK _c0 58861 Time taken: 125.642 seconds, Fetched: 1 row(s) hive>

other queries working fine to els and hive . join query to hive tables working fine also. but when i send join query to es table then its happen its mean its about es-hadoop this is my opinion. you are expert and having knowledge more then me so you better knows then me. any help appreciated thanks!