apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.27k stars 1.18k forks source link

Support `min_by` in Aggregation function #12253

Closed Weijun-H closed 1 month ago

Weijun-H commented 2 months ago

Is your feature request related to a problem or challenge?

Description Finds the row with the minimum val. Calculates the arg expression at that row. This function is affected by ordering.
Example min_by(A, B)
Alias(es) argMin(arg, val), arg_min(arg, val)

Describe the solution you'd like

D SELECT min_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
┌──────────────┐
│ min_by(x, y) │
│   varchar    │
├──────────────┤
│ a            │
└──────────────┘

Describe alternatives you've considered

No response

Additional context

https://duckdb.org/docs/sql/functions/aggregates#min_byarg-val https://docs.databricks.com/en/sql/language-manual/functions/min_by.html

dharanad commented 2 months ago

take

korowa commented 2 months ago

These probably could be an aliases (with some additional transformations) for first/last aggregation functions -- otherwise there it'll end with two implementations of basically the same function (not something bad, but still redundant).

Relevant discussion -- https://github.com/apache/datafusion/issues/12075

Lordworms commented 2 months ago

Also implemented in https://github.com/apache/datafusion/pull/12284 using a function rewrite way

alamb commented 2 months ago

Given https://github.com/apache/datafusion/issues/12357 we may want to move this function to some other repo rather than the core

alamb commented 1 month ago

Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046

alamb commented 1 month ago

We are going to implement these in a different repository so closing this ticket to avoid confusion: https://github.com/apache/datafusion/issues/12254#issuecomment-2374446425