elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.15k stars 24.84k forks source link

ESQL: aggs over constants inconsistent in zero row edge case #113875

Open alex-spies opened 1 month ago

alex-spies commented 1 month ago

When a filter removes all rows before a STATS, most of our aggregations return null:

row x = 0 | where x > 0 | eval y = 1 | stats avg(y)

    avg(y)     
---------------
null       

Aggregations over constants are inconsistent with this, they always evaluate to some value:

row x = 0 | where x > 0 | stats avg(1)

    avg(1)     
---------------
1.0   

While this is inconsistent, I'm not sure if the consistent behavior for aggregations over constants is actually desirable; this would require evaluating the entire first part of the query upstream from the STATS just to see if there's at least one row, which is wasteful.

In case of AVG, the correct surrogate expression for AVG([1,2,3]) would probably be CASE(COUNT(*) > 0, 1, null) * MV_AVG([1,2,3]) rather than just MV_AVG([1,2,3]).

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-analytical-engine (Team:Analytics)