apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.93k stars 1.12k forks source link

Support `max_by` in Aggregation function #12252

Open Weijun-H opened 2 weeks ago

Weijun-H commented 2 weeks ago

Is your feature request related to a problem or challenge?

max_by(arg, val)

Description Finds the row with the maximum val. Calculates the arg expression at that row. This function is affected by ordering.
Example max_by(A, B)
Alias(es) argMax(arg, val), arg_max(arg, val)

Describe the solution you'd like

D SELECT max_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
┌──────────────┐
│ max_by(x, y) │
│   varchar    │
├──────────────┤
│ b            │
└──────────────┘

Describe alternatives you've considered

No response

Additional context

https://duckdb.org/docs/sql/functions/aggregates#max_byarg-val https://docs.databricks.com/en/sql/language-manual/functions/max_by.html

Lordworms commented 2 weeks ago

take

korowa commented 2 weeks ago

These probably could be an aliases (with some additional transformations) for first/last aggregation functions -- otherwise there it'll end with two implementations of basically the same function (not something bad, but still redundant).

Relevant discussion -- https://github.com/apache/datafusion/issues/12075

alamb commented 1 week ago

Given https://github.com/apache/datafusion/issues/12357 we may want to move this function to some other repo rather than the core

alamb commented 2 days ago

Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046