Closed mkeller3 closed 8 years ago
Is that in Hive? For better performance of point-in-polygon aggregation, try a custom map-reduce application with a quadtree index on the polygons, as in the sample.
It is in Hive.
Right - if the performance of the Hive query is not adequate, try custom MapReduce.
Randall,
Running the custom MapReduce does not work either. I run into the error:
Error: GC overhead limit exceeded
Can you give the JVM more memory?
That did not help.
Could this be due to the fact that only one mapper and reducer is being used on such a large polygon dataset?
Yes, single mapper and reducer would certainly cause performance and/or scalability issues.
Is there anyway to force more mappers and reducers to be used?
In custom map-reduce, number of reducers, yes.
I am trying to count how many points are within each zip code in the United States and it takes over 21 hours to complete.
If i use a very basic json such as states it runs within 3 minutes but anytime i try to use polygons that are far more advance the job takes substantially longer.
Is there any reasoning behind this?