The work in https://github.com/elastic/elasticsearch/pull/112938 enables pushdown of ST_DISTANCE to lucene for both filtering and sorting, including when the distance result is expressed in a separate EVAL command:
FROM index
| EVAL distance=ST_DISTANCE(location, TO_GEOPOINT("POINT(0 0)"))
| WHERE distance < 20000
| SORT distance ASC
| KEEP name
In this query, we drop the distance attribute. The same would happen if we did a STATS:
FROM index
| EVAL distance=ST_DISTANCE(location, TO_GEOPOINT("POINT(0 0)"))
| WHERE distance < 20000
| STATS count=COUNT(*) BY country
| SORT count DESC, country, ASC
In both cases, the distance value does not need to be calculated, because the ST_DISTANCE function will be pushed to Lucene entirely. However, in the work done in https://github.com/elastic/elasticsearch/pull/112938, the column for distance will remain in the table of results all the way up to the KEEP or the STATS command and only then dropped. The consequences of this are that we are still about much slower than we need to be, because we perform unnecessary FieldExtract(location) and unnecessary ST_DISTANCE(location), only to drop those values. Early benchmarks show that when we push down without the EVAL command, we are at least 7x faster than this.
The work in https://github.com/elastic/elasticsearch/pull/112938 enables pushdown of ST_DISTANCE to lucene for both filtering and sorting, including when the distance result is expressed in a separate EVAL command:
In this query, we drop the
distance
attribute. The same would happen if we did a STATS:In both cases, the distance value does not need to be calculated, because the ST_DISTANCE function will be pushed to Lucene entirely. However, in the work done in https://github.com/elastic/elasticsearch/pull/112938, the column for
distance
will remain in the table of results all the way up to theKEEP
or theSTATS
command and only then dropped. The consequences of this are that we are still about much slower than we need to be, because we perform unnecessaryFieldExtract(location)
and unnecessaryST_DISTANCE(location)
, only to drop those values. Early benchmarks show that when we push down without theEVAL
command, we are at least 7x faster than this.