apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.93k stars 1.12k forks source link

Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions #11732

Open Dandandan opened 1 month ago

Dandandan commented 1 month ago

Is your feature request related to a problem or challenge?

After https://github.com/apache/datafusion/pull/11721/files we handle nulls in APPROX_PERCENTILE_CONT by removing them before computing the percentile. This is the default behaviour of PostgreSQL, Spark, etc.

However the syntax of the current aggregate functions is confusing for APPROX_PERCENTILE_CONT, as it supports IGNORE NULLS | RESPECT NULLS

Describe the solution you'd like

Come up with a plan to bring the syntax / semantics of APPROX_PERCENTILE_CONT and PERCENTILE_CONT and similar aggregations closer to PostgreSQL and others by (one or more).

Describe alternatives you've considered

No response

Additional context

No response

samuelcolvin commented 2 days ago

See https://github.com/pydantic/logfire/issues/433, support for WITHIN GROUP would be great.