Esri / spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Apache License 2.0
367 stars 160 forks source link

SemanticException #64

Closed parker20121 closed 9 years ago

parker20121 commented 10 years ago

TWIMC:

I'm trying to run the geospatial classes in HDP 2.1 using a simple example, but I keep getting the following error:

SemanticException: Line 1:103 Wrong arguments 'latitude': The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path

Seems like it doesn't recognize the (double, double) method on ST_Point.

I have added the jar files (esri-geometry-api.jar and spatial- to HDFS, and included the paths to each through the Hue configuration panel. I check the jar files, and the class, ST_Point, is in it.

I created a simple table to test their functionality with three columns

create external table geotest ( id INT, latitude DOUBLE, longitude DOUBLE) ..........

added some data

1,2,3 2,1,1 3,1,0 4,6,10

and ran the following query:

select "1", count(*) from geotest where ST_Contains( ST_Polygon("polygon((0 0, 0 3, 3 3, 3 0, 0 0))"), ST_Point(geotest.longitude,geotest.latitude))

I would appreciate any new ideas on how to solve this.

TIA,

M.

Here is the job log:

14/08/19 07:08:40 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO parse.ParseDriver: Parsing command: use default
14/08/19 07:08:40 INFO parse.ParseDriver: Parse Completed
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=parse start=1408457320720 end=1408457320721 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO ql.Driver: Semantic Analysis Completed
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1408457320721 end=1408457320721 duration=0 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=compile start=1408457320720 end=1408457320721 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO ql.Driver: Starting command: use default
14/08/19 07:08:40 INFO impl.TimelineClientImpl: Timeline service address: http://ner-faa02.icenet.local:8188/ws/v1/timeline/
14/08/19 07:08:40 INFO hooks.ATSHook: Created ATS Hook
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=PreHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=PreHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1408457320804 end=1408457320805 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1408457320720 end=1408457320805 duration=85 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=runTasks start=1408457320805 end=1408457320813 duration=8 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO impl.TimelineClientImpl: Timeline service address: http://ner-faa02.icenet.local:8188/ws/v1/timeline/
14/08/19 07:08:40 INFO hooks.ATSHook: Created ATS Hook
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1408457320888 end=1408457320889 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1408457320721 end=1408457320889 duration=168 from=org.apache.hadoop.hive.ql.Driver>
OK
14/08/19 07:08:40 INFO ql.Driver: OK
14/08/19 07:08:40 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1408457320889 end=1408457320889 duration=0 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:40 INFO log.PerfLogger: </PERFLOG method=Driver.run start=1408457320720 end=1408457320889 duration=169 from=org.apache.hadoop.hive.ql.Driver>
converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/esri-geometry-api.jar
14/08/19 07:08:40 INFO SessionState: converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/esri-geometry-api.jar
Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar to class path
14/08/19 07:08:40 INFO SessionState: Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar to class path
Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar
14/08/19 07:08:40 INFO SessionState: Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar
converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/spatial-sdk-hadoop.jar
14/08/19 07:08:41 INFO SessionState: converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/spatial-sdk-hadoop.jar
Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar to class path
14/08/19 07:08:41 INFO SessionState: Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar to class path
Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar
14/08/19 07:08:41 INFO SessionState: Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar
14/08/19 07:08:41 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:41 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:41 INFO parse.ParseDriver: Parsing command: select "1", count(*) from geotest where ST_Contains( ST_Polygon("polygon((0 0, 0 3, 3 3, 3 0, 0 0))"), ST_Point(geotest.longitude,geotest.latitude))
14/08/19 07:08:41 INFO parse.ParseDriver: Parse Completed
14/08/19 07:08:41 INFO log.PerfLogger: </PERFLOG method=parse start=1408457321029 end=1408457321031 duration=2 from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:41 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
14/08/19 07:08:41 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
14/08/19 07:08:41 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
14/08/19 07:08:41 INFO parse.SemanticAnalyzer: Get metadata for source tables
14/08/19 07:08:41 INFO parse.SemanticAnalyzer: Get metadata for subqueries
14/08/19 07:08:41 INFO parse.SemanticAnalyzer: Get metadata for destination tables
14/08/19 07:08:41 INFO ql.Context: New scratch dir is hdfs://ner-faa01.icenet.local:8020/tmp/hive-beeswax-hdfs/hive_2014-08-19_07-08-41_029_8161824702539494979-10
14/08/19 07:08:41 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis
FAILED: SemanticException [Error 10014]: Line 1:103 Wrong arguments 'latitude': The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path
14/08/19 07:08:41 ERROR ql.Driver: FAILED: SemanticException [Error 10014]: Line 1:103 Wrong arguments 'latitude': The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:103 Wrong arguments 'latitude': The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path
    at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1136)
    at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
    at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
    at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
    at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
    at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
    at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:184)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:9702)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9658)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9629)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2349)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2330)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8142)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9001)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9267)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:426)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.checkedCompile(BeeswaxServiceImpl.java:252)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.compile(BeeswaxServiceImpl.java:202)
    at com.cloudera.beeswax.BeeswaxServiceImpl$2.run(BeeswaxServiceImpl.java:835)
    at com.cloudera.beeswax.BeeswaxServiceImpl$2.run(BeeswaxServiceImpl.java:828)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
    at com.cloudera.beeswax.BeeswaxServiceImpl.doWithState(BeeswaxServiceImpl.java:777)
    at com.cloudera.beeswax.BeeswaxServiceImpl.query(BeeswaxServiceImpl.java:827)
    at com.cloudera.beeswax.api.BeeswaxService$Processor$query.getResult(BeeswaxService.java:915)
    at com.cloudera.beeswax.api.BeeswaxService$Processor$query.getResult(BeeswaxService.java:899)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

14/08/19 07:08:41 INFO log.PerfLogger: </PERFLOG method=compile start=1408457321029 end=1408457321061 duration=32 from=org.apache.hadoop.hive.ql.Driver>
randallwhitman commented 10 years ago

This reminds me of #46.

climbage commented 10 years ago

Did you add the framework jar files through the Hive command line?

parker20121 commented 10 years ago

I've tried adding them to

/usr/lib/hadoop/lib

/usr/lib/hive/lib

I tried setting hive --auxpath

Nothing works.

parker20121 commented 10 years ago

The ST_Point class is public:

public class ST_Point extends ST_Geometry { static final Log LOG = LogFactory.getLog(ST_Point.class.getName());

// Number-pair constructor - 2D
public BytesWritable evaluate(DoubleWritable x, DoubleWritable y) {
    return evaluate(x, y, null, null);
}

climbage commented 10 years ago

@parker20121

Did you also add them using the ADD JAR command in Hive though? Not having that will have it fail with a semantic exception.

parker20121 commented 10 years ago

Yes. Tried "add jar [HDFS location];" (as hive, hdfs, and root users) on the left had side of the Hue query interface and as HQL.

parker20121 commented 10 years ago

Don't know if it matters, but HDP is using hive 0.13.1 and hadoop 2.4.0.x.

parker20121 commented 10 years ago

It appears in the log that it finds and adds the jar files to the classpath.

converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/esri-geometry-api.jar 14/08/19 07:08:40 INFO SessionState: converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/esri-geometry-api.jar Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar to class path 14/08/19 07:08:40 INFO SessionState: Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar to class path Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar 14/08/19 07:08:40 INFO SessionState: Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/esri-geometry-api.jar converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/spatial-sdk-hadoop.jar 14/08/19 07:08:41 INFO SessionState: converting to local hdfs://ner-faa01.icenet.local:8020/user/hive/spatial-sdk-hadoop.jar Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar to class path 14/08/19 07:08:41 INFO SessionState: Added /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar to class path Added resource: /tmp/b9bb03cb-ec47-4ce8-abe2-537bee7deafd_resources/spatial-sdk-hadoop.jar

climbage commented 10 years ago

Did you run the create temporary function commands in HUE?

Specifically,

create temporary function ST_Point as 'com.esri.hadoop.hive.ST_Point';
create temporary function ST_Contains as 'com.esri.hadoop.hive.ST_Contains';
parker20121 commented 10 years ago

Yes.

geotest

climbage commented 10 years ago

What if you add the UDFs using the side bar? If that doesn't work, we'll have to get ourselves a VM with same setup and try to repro the issue.

image

parker20121 commented 10 years ago

No dice.

no_dice

parker20121 commented 10 years ago

Here is the final configuration

geotest_final_configuration

ddkaiser commented 10 years ago

HDP 2.1 is using Hue 2.3.x I think? Hue 2.5 or newer will leverage the connection to HiveServer2 which changes some things. Essentially Hue 2.3.x uses Beeline system instead of Beeswax. Unofficially there is a build of Hue 2.5 or newer that will be in the HDP repositories in a short time, but I don't know when. (And certainly HDP 2.2 will include a Hue refresh to a major version)

For now, you may be able to try creating $HIVE_HOME/auxlib Add that path to the "hive.aux.jars.path" property in hive-site.xml If using Ambari, you create an entry for hive.aux.jars.path and the value is: file:///usr/lib/hive/auxlib or if not using Ambari, the hive-site.xml will contain something like this:

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///usr/lib/hive/auxlib</value>
</property>

I think the Hive CLI automatically picks up the auxlib, and the hive-site.xml is for the metastore and HiveServer2 process startup scripts (or was it the other way around?)

Place the auxiliary jars, esri-geometry-api.jar, etc. in the $HIVE_HOME/auxlib path.

At that point, you will not need "add jar" statements for those jars. (I think) Probably needs some testing. Certainly I think it will work this way with Hue 2.5 or greater.

parker20121 commented 10 years ago

Looks like the Hue version is 2.3.2-471. I tried adding the auxpath via the CLI, and that got me the same result.

geotest-auxpath

ddkaiser commented 10 years ago

So, not a Hue issue at all. I would take a good hard look at the jar files, unzip them to a temp directory, make sure classes are structured correctly, etc. We did have to fix ST_Point not being public, perhaps there is something else not being built correctly?

parker20121 commented 10 years ago

I built both jars from scratch using the provided pom files. Updates to hive and hadoop version numbers were the only modifications I made. The class is public (see above). And ST_Point is in the jar file.

geotest-jar-contents

climbage commented 10 years ago

@parker20121 Can you show your table schema?

parker20121 commented 10 years ago

geotest-schema

parker20121 commented 10 years ago

I have a few rows of data in there for testing.

geotest-data

climbage commented 10 years ago

How about...

describe function st_point;

?

climbage commented 10 years ago

You used ersi instead of esri in the last test.

parker20121 commented 10 years ago

geotest-function-query

geotest-function-results

climbage commented 10 years ago

Well... it seems like this issue might be outside of our control, unfortunately. We'll try to reproduce it here.

parker20121 commented 10 years ago

thanks for looking it over. Your help is greatly appreciated.

parker20121 commented 10 years ago

I tried creating my own UDF that does a point in polygon test. I put it in a jar, and the query ran to completion. I traded some emails on the HortonWorks site, and another person noted that:

Unfortunately prior to HIVE-6995, Hive was not logging the actual cause of the error here and just giving a generic “not in class path” message, that would be helpful here to help troubleshoot.

So upgrading to hive 0.14 might help.

AlexandreLINTE commented 10 years ago

Hello,

any update on this problem. I compile the master branch and I still have the same issue.

hive (default)> add jar /my/path/spatial-sdk-json-1.0.3-SNAPSHOT.jar; Added /my/path/spatial-sdk-json-1.0.3-SNAPSHOT.jar to class path Added resource: /my/path/spatial-sdk-json-1.0.3-SNAPSHOT.jar

hive (default)> create temporary function ST_Point as 'com.esri.hadoop.hive.ST_Point'; OK Time taken: 0.023 seconds hive (default)> select ST_Point(0,0); FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path

climbage commented 10 years ago

@parker20121 Have you had any success with this?

parker20121 commented 10 years ago

I haven't had time to track it down.

climbage commented 10 years ago

@AlexandreLINTE Can pull the latest and try again? I can't seem to make it fail on my machine but I might have fixed it.

prongs commented 9 years ago

@AlexandreLINTE Did you have any success with this?

climbage commented 9 years ago

@prongs Are you having the same issue?

prongs commented 9 years ago

I had this issue a while back but after a while it started working fine. Didn't investigate too much into it. Might be Some specific config causes this issue. Don't know any other details.

climbage commented 9 years ago

Gotcha. Well if you think of anything you might have changed (like Hive or spatial-framework versions), let us know.

parker20121 commented 9 years ago

This works for me now on HDP 2.1.

AlexandreLINTE commented 9 years ago

@climbage I am sorry I had no time to test right now, but I will compile and test it again in a few weeks.

randallwhitman commented 9 years ago

As mentioned last September but without hyperlink, the issue may have been resolved by commit https://github.com/Esri/spatial-framework-for-hadoop/commit/c9cb6cf9998ef882fd3ddc41f13d93432d6c25e6.