PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.81k stars 214 forks source link

Syntax for aggregation with multiple arguments #35

Closed RCHowell closed 2 years ago

RCHowell commented 2 years ago

What is the syntax for aggregations which take multiple arguments such as percentile, corr, max_by, etc? All examples only contain a single argument. I was thinking to deliminate with space

fn(a, b, c) -> fn a b c
aggregate by:[title, country] [
    min salary,
    pct salary 0.25,
    pct salary 0.50,
    pct salary 0.75,
    max salary
    ...
]
max-sixty commented 2 years ago

Yes, I think that's ideal, thanks @RCHowell

I would tend to have the settings of the function be first, and the column arg last, because pipes put their input last, so the most flexible arguments should go last, e.g. salary | pct 0.5. And if we allowed partial functions, this would be compatible; i.e. func median = pct 0.5).

If you want to have a go at a few example functions, they could go into examples!

RCHowell commented 2 years ago

What about a placeholder, because some aggregations can take multiple identifier expressions such as corr col_1, col_2?

Also, "settings" of the function typically goes after the "core" arguments in the case of function overloading. A simple example would be foo.indexOf(bar) and foo.indexOf(bar, offset) ie default parameters via overloading.

max-sixty commented 2 years ago

Ah, so in R, the piped argument goes first:

Are there other languages which work that way? Here's an example of OCaml approach, where they go last: https://github.com/max-sixty/prql/issues/11#issuecomment-1021911352

For optional args, then the order doesn't matter, so that would be bar | indexOf offset:1 with either of the options...

RCHowell commented 2 years ago

IIRC the piped arg goes first in Elixir

https://elixirschool.com/en/lessons/basics/pipe_operator

max-sixty commented 2 years ago

Nice find @RCHowell .

So we have pipes putting the arg

Julia has implementation that do each (e.g. Chain.jl vs DataPipes.jl)!

An advantage of putting it last is that partials work (func median = pct 0.5), and it makes more sense to me that the settings of a function bind more tightly than what the function runs on.

But I don't have a super-confident view, so whatever the consensus is I'm happy with.

max-sixty commented 2 years ago

Here's a discussion of Clojure's approach, which has a pipe symbol for each type (!): https://clojure.org/guides/threading_macros

max-sixty commented 2 years ago

Closing — we went ahead with the Ocaml & F# approach. Not completely immutable; we can see how we go.