elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.1k stars 24.83k forks source link

Geo_Polygon Performance Issues #1486

Closed geodawg closed 11 years ago

geodawg commented 12 years ago

I have been doing some rigorous testing to see if I can make ElasticSearch perform better than a relational database. In particular, I would like to be able to run a query that ultimately returns all the points within a complex polygon. To compare the two, I loaded the entire geonames allcountries.txt file in to Postgresql 9.0 (with latest PostGIS) on my Macbook Pro. I am also using the latest release version of ES on the same machine.

There are 7,836,496 point geometries in the geonames data set. Here are my stats from 3 different queries and note that the times are slower as expected but the number of returned results are much different which concerns me. The countries were chosen based on their size from small => medium => large.

Bahrain: PG Time: 19 12 18 16ms

's : 210 210 210 210

Bahrain: ES Time: 5923 1302 1186 1138ms

's : 219 219 219 219

Egypt: PG Time: 5339 2812 3024 3312ms

's : 23203 23203 23203 23203

Egypt: ES Time: 14754 14353 14380 14399ms

's : 10759 10759 10759 10759

China: PG Time: 57617 57520 61826 58494

's : 252723 252723 252723 252723

China: ES Time: 90999 170864 282611 289245ms

's : 150230 150230 150230 150230

The query I used in ES can be found here. https://gist.github.com/1385785 Due to size limitations in gist, I had to cut off the polygon so the full polygon data can be found here https://ogr2elasticsearch.googlecode.com/files/polygons.zip

The postgresql query can be found here. https://gist.github.com/1385796 if someone else would like to try and replicate the test. I also noticed that ES is returning data from neighboring countries as well which means that it is not honoring the same "within" conditions that postgis uses.

Adam

geodawg commented 12 years ago

All,

I just noticed that some of my geometries are MULTIPOLYGON types which may be the reason for the wonky results. How does ES address this?

Adam

kimchy commented 12 years ago

There is no support for multi polygon, just polygon. Also, to improve perf, try and wrap the polygon with a bounding box. You can, possibly, build a several polygon filters and use and/or/not filters to combine them.

dsmiley commented 12 years ago

FYI I am working on high performance geospatial filtering using Lucene. The code is called "LSP" hosted here http://code.google.com/p/lucene-spatial-playground/ and I am aiming to pitch it as the replacement for Lucene's defunct spatial module. For filtering alone (no geo sort / relevancy), there is no RAM requirement. Polygon support is implemented via JTS, and a variable number of shapes of any kind is supported. Shapes with area are also supported for indexing. It'll probably be another month or so of me adding tests and plugging a few holes here and there before I'll pitch it for Lucene's spatial replacement. That may be a good time for ElasticSearch to consider it. It's using Lucene 4/trunk, not 3x, though.

kimchy commented 12 years ago

@dsmiley looks interesting!, would love to integrate it with elasticsearch once its done and Lucene 4.0 is released :)

apatrida commented 12 years ago

Is this still in the "waiting to use" stage?

nwohaibi commented 11 years ago

@geodawg, great testing, you spared me a lot of time trying to accomplish the same thing. I just want to know if you had run the same testing after ES implemented Spatial4j? I would love to see the results.

clintongormley commented 11 years ago

Geoshapes now supported in v 0.90+