elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.63k stars 24.64k forks source link

ESQL: Optimize data streamed (but not consumed) with a count and generator function #98703

Open costin opened 1 year ago

costin commented 1 year ago

Description

There's a rare yet possible type of queries that return a constant for each entry being returned:

FROM index
| WHERE x > 10
| EVAL a = "some string"
| KEEP a

The actual content of index is not used, only the number of matches are required. Instead of loading the data only to discard it, this can be optimized as:

FROM X
| WHERE x > 10                              // apply the filter
| STATS c = COUNT()                         // but just count things
| EVAL number = CASE(c > 10000, c, 10000)   // consider the maximum limit of returned items
| GENERATE number, null                     // generate said amount of items (happens on the coordinator)
| EVAL a = "some string"                    // for each perform the eval
| KEEP a                                    // return just the eval itself

This should be more efficient as no data needs to be loaded or sorted , just counted (which should be pushed down). Furthermore the limit itself is taken into account to prevent creating too many constants. Lastly as defined right now, the generator command will create a ConstantBlock which is quite efficient.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine commented 1 year ago

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine commented 6 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)