elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.62k stars 24.64k forks source link

ESQL: Add pre and post filter for grouping operator #111439

Open costin opened 1 month ago

costin commented 1 month ago

Description

Grouping (STATS) command can be quite expensive, whether for processing data coming in (creating groups) or out (number of buckets), etc... This problem can be alleviated by allowing pre and post filters, both for individual aggs and grouping keys on the grouping command to drop the data as soon being read or is being produced. Example of pre-filter (see #110821):

FROM index | STATS a_avg = AVG(a) WHERE a > 10, avg = AVG(a)  BY g

Example of post-filter:

FROM index | STATS c = count(*) by g | WHERE c > 10
elasticsearchmachine commented 1 month ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 commented 1 month ago

From a compute engine standpoint it feels like we could do:

-void addRawInput(Page page);
+void addRawInput(Page page, BooleanVector);

That should be easy to make the code generation stuff build. That's be super easy to specialize into constantAll, constantNone, and variable. We can make those BooleanVectors out of the expression trees easy enough.

costin commented 1 month ago

Since the bool is used only for filtering, there's no need for MV or null handling - how about using a simple bitset instead ?

nik9000 commented 2 weeks ago

For pre-filtering: