Closed geodawg closed 11 years ago
All,
I just noticed that some of my geometries are MULTIPOLYGON types which may be the reason for the wonky results. How does ES address this?
Adam
There is no support for multi polygon, just polygon. Also, to improve perf, try and wrap the polygon with a bounding box. You can, possibly, build a several polygon filters and use and
/or
/not
filters to combine them.
FYI I am working on high performance geospatial filtering using Lucene. The code is called "LSP" hosted here http://code.google.com/p/lucene-spatial-playground/ and I am aiming to pitch it as the replacement for Lucene's defunct spatial module. For filtering alone (no geo sort / relevancy), there is no RAM requirement. Polygon support is implemented via JTS, and a variable number of shapes of any kind is supported. Shapes with area are also supported for indexing. It'll probably be another month or so of me adding tests and plugging a few holes here and there before I'll pitch it for Lucene's spatial replacement. That may be a good time for ElasticSearch to consider it. It's using Lucene 4/trunk, not 3x, though.
@dsmiley looks interesting!, would love to integrate it with elasticsearch once its done and Lucene 4.0 is released :)
Is this still in the "waiting to use" stage?
@geodawg, great testing, you spared me a lot of time trying to accomplish the same thing. I just want to know if you had run the same testing after ES implemented Spatial4j? I would love to see the results.
Geoshapes now supported in v 0.90+
I have been doing some rigorous testing to see if I can make ElasticSearch perform better than a relational database. In particular, I would like to be able to run a query that ultimately returns all the points within a complex polygon. To compare the two, I loaded the entire geonames allcountries.txt file in to Postgresql 9.0 (with latest PostGIS) on my Macbook Pro. I am also using the latest release version of ES on the same machine.
There are 7,836,496 point geometries in the geonames data set. Here are my stats from 3 different queries and note that the times are slower as expected but the number of returned results are much different which concerns me. The countries were chosen based on their size from small => medium => large.
Bahrain: PG Time: 19 12 18 16ms
's : 210 210 210 210
Bahrain: ES Time: 5923 1302 1186 1138ms
's : 219 219 219 219
Egypt: PG Time: 5339 2812 3024 3312ms
's : 23203 23203 23203 23203
Egypt: ES Time: 14754 14353 14380 14399ms
's : 10759 10759 10759 10759
China: PG Time: 57617 57520 61826 58494
's : 252723 252723 252723 252723
China: ES Time: 90999 170864 282611 289245ms
's : 150230 150230 150230 150230
The query I used in ES can be found here. https://gist.github.com/1385785 Due to size limitations in gist, I had to cut off the polygon so the full polygon data can be found here https://ogr2elasticsearch.googlecode.com/files/polygons.zip
The postgresql query can be found here. https://gist.github.com/1385796 if someone else would like to try and replicate the test. I also noticed that ES is returning data from neighboring countries as well which means that it is not honoring the same "within" conditions that postgis uses.
Adam