algattik / databricks-geomesa

Geospatial analytics with GeoMesa on Databricks
Apache License 2.0
7 stars 1 forks source link

Ingesting data into GeoMesa data store when using databricks? #2

Open chen7572 opened 4 years ago

chen7572 commented 4 years ago

I am using Databricks with data stored as delta tables on Amazon s3. I wonder if I need to set up the GeoMesa datastore and store my data in the geomesa datastore, in order to scale up the spatial join with several millions to billions of data points.

I have this question because I am not able to scale up the spatial join for several millions of data points. I was able to follow the airport example notebook to perform spatial join Points with lat/long coordinates with Polygons. The performance was good for comparing up to a million points with ~8,000 polygons. But when I am dealing with 5 - 10 million points with the 8,000 polygons, the performance reduces dramatically (each partition's executor computing time increased by more than 100 times), which made me wonder if the data itself needs to be ingested into GeoMesa datastore first.

Thanks for any input!