duckdb / duckdb-web

DuckDB website and documentation
https://duckdb.org
MIT License
155 stars 293 forks source link

count_if: sometimes behaves like an aggregate function, but sometimes not - extra documentation welcome. #3294

Open rpbouman opened 1 month ago

rpbouman commented 1 month ago

count_if() is documented here:

https://duckdb.org/docs/sql/functions/utility#count_ifx

when we try it, a basic example suggests it's actually an aggergate function:

select count_if(x) 
from (values (1, 'a'), (2, 'b'), (null, 'c') ) as t( x, y );

result:

┌─────────────┐
│ count_if(x) │
│   int128    │
├─────────────┤
│           2 │
└─────────────┘

When we add y as non-aggregate, non-group by column, we also get a message we would expect if count_if were an aggregate function:

select count_if(x), y 
from (values (1, 'a'), (2, 'b'), (null, 'c') ) as t( x, y );

result:

Binder Error: column "y" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(y)" if the exact value of "y" is not important.

Going by these examples we might think it works as any count_if(x) could be rewritten as something like: count(case x::BOOLEAN when true then true else null end).

Despite the similarity, count_if() is not allowed as window function, and the error message in that case suggests count_if() is not an aggregate function:

select count_if(x) over () 
from (values (1, 'a'), (2, 'b'), (null, 'c') ) as t( x, y );

result:

Catalog Error: count_if is not an aggregate function

I think this behavior is sufficiently confusing to allow some extra documentation.

Alternatively, perhaps the implementation of count_if could be modified to allow it to be used just like count, i.e. the current behavior plus the ability to use it as window function. In that case it could also be moved to the aggregate functions page and window functions page.

rpbouman commented 1 month ago

I just noticed count_if()is actually a MACRO defined as sum(CASE WHEN (l) THEN (1) ELSE 0 END) In that respect the error message "is not an aggregate function" is weird; sum is clearly a normal aggregate function. but apparently this information is lost after it got wrapped in a macro.