elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

ES|QL version of spatial intersects search slow on some benchmarks #108756

Open craigtaverner opened 5 months ago

craigtaverner commented 5 months ago

When comparing the three benchmarking tracks: geopoint (point data indexed as geo_point), geopointshape (point dataa indexed as geo_shape) and geoshape (complex geometries indexed as geo_shape), we see that for geopoint and geoshape ES|QL performs somewhat similarly to _search queries. However, for the geopointshape track, ES|QL performs about 100x worse. This performance is as bad as would be expected if the lucene push-down was not being enabled. Since the same queries are used, and the same index configuration, this seems surprising.

ES|QL benchmark results can be seen at https://elasticsearch-benchmarks.elastic.co/#tracks/esql/nightly/default/30d

A summary of queries and results can be seen below:

geoshape

For geoshape we see comparable results, with ES|QL only 44% slower than _search:

FROM osm*
| WHERE ST_Intersects(shape, TO_GEOSHAPE("POLYGON((-0.1 49.0, 5.0 48.0, 15.0 49.0, 14.0 60.0, -0.1 61.0, -0.1 49.0))"))
| LIMIT 10
Screenshot 2024-05-17 at 09 56 24

geopointshape

For geopointshape things are much, much worse with ES|QL over 100x slower:

FROM osmgeoshapes
| WHERE ST_Intersects(location, TO_GEOSHAPE("POLYGON((-0.1 49.0, 5.0 48.0, 15.0 49.0, 14.0 60.0, -0.1 61.0, -0.1 49.0))"))
| LIMIT 10
Screenshot 2024-05-17 at 09 57 16

geopoint

For geopoint things are reasonable again with ES|QL less than 2x slower:

FROM osmgeopoints
| WHERE ST_Intersects(location, TO_GEOSHAPE("POLYGON((-0.1 49.0, 5.0 48.0, 15.0 49.0, 14.0 60.0, -0.1 61.0, -0.1 49.0))"))
| LIMIT 10
Screenshot 2024-05-17 at 10 04 26
elasticsearchmachine commented 5 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

craigtaverner commented 5 months ago

Looks like all ES|QL spatial search benchmarks for geopointshape track show this issue, so that means ST_INTERSECTS, ST_CONTAINS, ST_WITHIN and ST_DISJOINT are all performing about 100x slower than _search, but only for geopointshape, not geopoint or geoshape. The most likely reason would be failed lucene pushdown, but since the query is the same between tracks, and the field mapping is also the same, that is very surprising.