Esri / spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Apache License 2.0
363 stars 160 forks source link

How to extract geometry in Json format using spatial-sdk-hive api in Java. #94

Closed amitabh74 closed 8 years ago

amitabh74 commented 8 years ago

Following is the syntax followed for creating the table - CREATE TABLE agg_samp(area binary, count double) ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'

STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

how can i extract the data in JSON format specially the binary information that stores the geometry? I have tried using java api for hive2 but it returns junk characters.

randallwhitman commented 8 years ago

Yes, the JsonSerde very intentionally parses JSON to a binary geometry.

If you want to convert Geometry to JSON in Java, you can use GeometryEngine.geometryToJson (or geometryToGeoJson).

However, if you want the raw JSON while using Hive, you may wish to evaluate the suitability of the more general JSON SerDe for the purpose.

amitabh74 commented 8 years ago

Thanks for your reply. But I guess I cannot use GeometryEngine.geometryToJson (or geometryToGeoJson), since hive resultset allows me to retrieve binary geometry as string which shows junk character in console on print(please refer the following snippet).

Connection con = DriverManager.getConnection("jdbc:hive2://HOST_NAME:10000", "", ""); Statement stmt = con.createStatement(); String sql = "Select count, area from taxi_agg LIMIT 5"; System.out.print(res.getString(1) + " " + res.getString(2));

However I can see proper output on hive prompt if I run query like this - hive> create temporary function ST_AsJSON as 'com.esri.hadoop.hive.ST_AsJson'; hive>Select count, ST_AsJson(area) from taxi_agg LIMIT 2;

The other option that you suggested "Hive-Json-SerDe", seems to work for CDH4, CDH5 and HDP 2.3. I am currently using HDP 2.1 and hive 0.13.1, so not sure whether it will be compatible.

randallwhitman commented 8 years ago

ST_AsJson - good observation. Go ahead and try adapting the source code from ST_AsJson.

amitabh74 commented 8 years ago

Yes I have seen the source of the class ST_AsJson. To convert my binary data to Json I need to invoke evaluate(BytesWritable geomref). But the hive resultset as mentioned in my previous comment returns binary geometry as string which I cannot pass as parameter due to type mismatch. Moreover when I print the string result it shows me junk characters. Not sure how I can get around this problem.

randallwhitman commented 8 years ago

Rather than trying to invoke St_AsJson.evaluate itself, it might be more effective to take pieces of the AsJson source code to adapt into your custom code.

amitabh74 commented 8 years ago

Thanks Randall, will try to do the same. Could you please point me to the source code location

randallwhitman commented 8 years ago

https://github.com/Esri/spatial-framework-for-hadoop/blob/master/hive/src/main/java/com/esri/hadoop/hive/ST_AsJson.java

amitabh74 commented 8 years ago

I did try by implementing the pieces from AsJson source in the following manner -

Connection con = DriverManager.getConnection("jdbc:hive2://RHDCN05.rmsi.com:10000", "", ""); Statement stmt = con.createStatement();

String sql = "Select * from taxi_agg LIMIT 2"; ResultSet res = stmt.executeQuery(sql); while (res.next()) { try{ if(res.getString(1) != null){ byte[] b = res.getString(1).getBytes(); BytesWritable writable = new BytesWritable(b); OGCGeometry ogcGeometry = GeometryUtils.geometryFromEsriShape(writable); if (ogcGeometry == null){ System.out.println("Gemoetry is null"); return; } Geometry esriGeom = ogcGeometry.getEsriGeometry(); int wkid = GeometryUtils.getWKID(writable); Text text = new Text(GeometryEngine.geometryToJson(wkid, esriGeom)); System.out.println(text); }else{ System.out.println("value is null"); } }catch(Exception e){ e.printStackTrace(); } }

However the output value that is being returned is {"rings":[]} which is incorrect. I believe this is due to the fact that I am trying to convert geometry String returned by the resultset to BytesWritable: byte[] b = res.getString(1).getBytes(); BytesWritable writable = new BytesWritable(b);

Any suggestion regarding what can be done to get around this problem?

randallwhitman commented 8 years ago

First, getString does not look right; I would expect getObject to be needed for binary column.

Avoiding conversion to/from BytesWritable could be done by going inside GeometryUtils.geometryFromEsriShape and doing something similar. Roughly, something like:

byte[] geomBytes = res.getObject(...);
int offset = SIZE_WKID + SIZE_TYPE;
ByteBuffer shapeBuffer = ByteBuffer.wrap(geomBytes, offset, geomBytes.length - offset).slice().order(ByteOrder.LITTLE_ENDIAN);
// or GeometryEngine.geometryFromEsriShape(...)
Geometry esriGeom = OperatorImportFromESRIShape.local().execute(0, Geometry.Type.Unknown, shapeBuffer);