Esri / gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
http://esri.github.io/gis-tools-for-hadoop/
Apache License 2.0
521 stars 254 forks source link

Error with ST_LineString when running query below #49

Open mpharding opened 8 years ago

mpharding commented 8 years ago

In the Hadoop YARN log for a container I am seeing these errors:

2016-07-12 20:10:55,516 [ERROR] [TezChild] |hive.ST_LineString|: Internal error - ST_LineString: java.lang.NullPointerException. 2016-07-12 20:10:55,517 [ERROR] [TezChild] |hive.ST_SetSRID|: Invalid arguments - one or more arguments are null. 2016-07-12 20:10:55,517 [ERROR] [TezChild] |hive.ST_GeodesicLengthWGS84|: Invalid arguments - one or more arguments are null.

The query im running is:

select PreQuery.name, sum(case when PreQuery.Geode < 10.0 then 1 else 0 end) 10mCount, sum(case when PreQuery.Geode < 50.0 then 1 else 0 end) 50mCount, sum(case when PreQuery.Geode < 1000.0 then 1 else 0 end) 100mCount from ( select a.name, ST_GeodesicLengthWGS84( ST_SetSRID( ST_LineString(a.lat, a.lon, b.lat, b.lon),4326)) as Geode from a, b) PreQuery GROUP BY PreQuery.name ORDER by 1000mCount desc

When I run this on a few thousand records it works fine but when I run on over 54k I see these problems.

Any ideas why?

climbage commented 8 years ago

It looks like ST_LineString is returning a null and ST_GeodesicLengthWGS84 is logging the error because the geometry is null. My guess is that one or more of your records in the larger dataset has invalid/null values for lat and lon, which is causing ST_LineString to return null.

randallwhitman commented 8 years ago

Hmm, the log entry above makes it look like an ST_Geometry function is throwing NPE when it should instead log invalid null argument.

GISDev01 commented 8 years ago

@hardboy111 Were you able to double check your data to see if any of your records in the larger dataset have invalid or null values for your lat and lon?