Esri / gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
http://esri.github.io/gis-tools-for-hadoop/
Apache License 2.0
521 stars 254 forks source link

ST_Bin does not accept decimal numbers as first argument #82

Open doublebyte1 opened 5 years ago

doublebyte1 commented 5 years ago

I am trying to run the tutorial on this page. However, this instruction throws an error:

FROM (SELECT ST_Bin(0.001, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins SELECT ST_BinEnvelope(0.001, bin_id) shape, COUNT(*) count GROUP BY bin_id;

FAILED: SemanticException [Error 10014]: Line 1:13 Wrong arguments 'dropoff_latitude': Argument 0 must be a number

It seems that it is throwing an error on the initialization of the object, but I cannot understand why.

Also tried replacin "0.001" by ".001", and got the same error. Using an integer ("1"), or the result of a division ("1/1000") works, but it gives me unexpected results. For instance, when I use:

FROM (SELECT ST_Bin(1/1000, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins SELECT ST_BinEnvelope(1/1000, bin_id) shape, COUNT(*) count GROUP BY bin_id;

The coordinates of the resulting shape are invalid lat, long (EPSG:4326):

{"type":"Polygon","coordinates":[[[-0.20249999989755452,3210.3374999998377],[-0.20149999989755452,3210.3374999998377],[-0.20149999989755452,3210.338499999838],[-0.20249999989755452,3210.338499999838],[-0.20249999989755452,3210.3374999998377]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-40.75249999994412,1727.0164999998417],[-40.75149999994412,1727.0164999998417],[-40.75149999994412,1727.0174999998417],[-40.75249999994412,1727.0174999998417],[-40.75249999994412,1727.0164999998417]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-40.719499999890104,1651.5524999999218],[-40.718499999890106,1651.5524999999218],[-40.718499999890106,1651.5534999999218],[-40.719499999890104,1651.5534999999218],[-40.719499999890104,1651.5524999999218]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-73.98149999999441,400.7774999997821],[-73.98049999999441,400.7774999997821],[-73.98049999999441,400.77849999978207],[-73.98149999999441,400.77849999978207],[-73.98149999999441,400.7774999997821]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,74.0134999998156],[0.0005000001508742571,74.0134999998156],[0.0005000001508742571,74.0144999998156],[-0.0004999998491257429,74.0144999998156],[-0.0004999998491257429,74.0134999998156]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,73.98749999997764],[0.0005000001508742571,73.98749999997764],[0.0005000001508742571,73.98849999997765],[-0.0004999998491257429,73.98849999997765],[-0.0004999998491257429,73.98749999997764]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.10349999996833503,73.98249999985657],[-0.10249999996833503,73.98249999985657],[-0.10249999996833503,73.98349999985658],[-0.10349999996833503,73.98349999985658],[-0.10349999996833503,73.98249999985657]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,73.97849999989941],[0.0005000001508742571,73.97849999989941],[0.0005000001508742571,73.97949999989942],[-0.0004999998491257429,73.97949999989942],[-0.0004999998491257429,73.97849999989941]]],"crs":null} 3.0 {"type":"Polygon","coordinates":[[[-0.4684999999590218,73.97449999994225],[-0.4674999999590218,73.97449999994225],[-0.4674999999590218,73.97549999994226],[-0.4684999999590218,73.97549999994226],[-0.4684999999590218,73.97449999994225]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,73.96049999997578],[0.0005000001508742571,73.96049999997578],[0.0005000001508742571,73.96149999997579],[-0.0004999998491257429,73.96149999997579],[-0.0004999998491257429,73.96049999997578]]],"crs":null} 3.0 Version 2.0.0

randallwhitman commented 5 years ago

Thanks for reporting this. I assume "version 2.0.0" refers to Spatial Framework for Hadoop. Please let us know the versions of Hive and Hadoop.

doublebyte1 commented 5 years ago

@randallwhitman Hadoop 2.8.5, Hive 2.3.4

randallwhitman commented 5 years ago

Thanks for the details. We do not have Hive-2.3.4 (nor Hadoop-2.8.5) installed, and unfortunately the testing framework has fallen behind - for which I went ahead and filed Esri/spatial-framework-for-hadoop#163. Maybe it will reproduce with another version of Hive or with SparkSql.

doublebyte1 commented 5 years ago

Ok. If it does not reproduce, let me know what is the highest version of Hadoop/Hive you have installed!

randallwhitman commented 5 years ago

Under Spark-SQL-2.2.0 SELECT ST_Bin(0.001, ST_Point(0,0)); output a large integer without error.

randallwhitman commented 5 years ago

I wonder if the exception has anything to do with data in a header row. Note dropoff_latitude is an argument to ST_Point, not directly to ST_Bin.

randallwhitman commented 5 years ago

@doublebyte1 Are you able to try out the query on a copy of trip_data_1.csv that omits the header row (first line), and see what happens?

doublebyte1 commented 5 years ago

@randallwhitman yes, I confirm that the error reproduces on a copy of trip_data_1.csv, without a header.

doublebyte1 commented 5 years ago

Under Spark-SQL-2.2.0 SELECT ST_Bin(0.001, ST_Point(0,0)); output a large integer without error.

The output of this query on hive 2.3.4. is:

hive> SELECT ST_Bin(0.001, ST_Point(0,0)); FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': Argument 0 must be a number

randallwhitman commented 5 years ago

Thanks for the additional details. That it reproduces with a minimal query could be very helpful - it might be possible to reproduce without a full installation.

randallwhitman commented 5 years ago

I'm curious what Hive-2.3 will do on the following:

SELECT ST_Bin(1e-3, ST_Point(0,0));
SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));

Hive-0.14/HDP-2.2.6 :

hive> SELECT ST_Bin(1e-3, ST_Point(0,0));
4611686015463124500
hive> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
4611686015463124500
hive> SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));
4611686015463124500

Spark-SQL-2.2 :

spark-sql> SELECT ST_Bin(1e-3, ST_Point(0,0));
4611686015463124500
Time taken: 0.097 seconds, Fetched 1 row(s)
spark-sql> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
Time taken: 0.296 seconds, Fetched 1 row(s)
spark-sql> SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));
4611686015463124500
Time taken: 0.085 seconds, Fetched 1 row(s)
doublebyte1 commented 5 years ago

Hive 2.3:

hive> SELECT ST_Bin(1e-3, ST_Point(0,0));
4611686015463124500
hive> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': Argument 0 must be a number
hive> SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '"Point(0 0)"': Argument 0 must be a number
hive> 
randallwhitman commented 5 years ago

Differing between 0.001 versus 1e-3 is really strange and makes we wonder if there's an issue with Hive's parser.

randallwhitman commented 5 years ago

And/or maybe some new ObjectInspector types have been introduced in Hive API. and missing from - https://github.com/Esri/spatial-framework-for-hadoop/blob/master/hive/src/main/java/com/esri/hadoop/hive/ST_Bin.java#L37..L44

randallwhitman commented 5 years ago

PrimitiveCategory.DECIMAL maybe? Per skimming https://hive.apache.org/javadocs/r2.3.5/api/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.PrimitiveCategory.html , https://hive.apache.org/javadocs/r2.3.5/api/org/apache/hadoop/hive/serde2/objectinspector/primitive/package-summary.html

randallwhitman commented 5 years ago

@doublebyte1 Are you able to test the linked branch Esri/spatial-framework-for-hadoop#164 ?

doublebyte1 commented 5 years ago

@randallwhitman unfortunately, I am not working with Hadoop anymore. I am not able to build this branch.

randallwhitman commented 5 years ago

Today I got Hive-2.3.5 to run, and reproduced:

hive> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': Argument 0 must be a number