PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.94k stars 218 forks source link

Operators for equality comparison, assigns and type annotations #437

Closed aljazerzen closed 2 years ago

aljazerzen commented 2 years ago

Current syntax:

func add x to:1 = x + to

from employees
filter country = "USA"
derive [
  gross_salary: add salary payroll_tax,
  gross_cost: add gross_salary benefits_cost
]
join blah side:5 [blah_id]
group [title, country] (
    aggregate [
        sum_gross_cost: sum gross_cost,
        count: count,
    ]
)

On discord, we are having the following discussion:

aljazerzen:

I've been working on window functions, the first step of which now works! Now I've been thinking how to move forward with function flags as scalar, aggregation or window, but I noticed that this could be solved by type annotations on the parameters:

  • func transform filter conditions frame = ...
  • func filter conditions frame: Frame = ...

Here I had transform as a keyword (similar to public / static / final in java) to annotate that this is a transform of frames (tables). Instead, I we can just use type annotations on parameters. As @snth pointed out, : is often used for type annotations so it was my natural choice. Even if this will be used only in stdlib, I think it could be consistent with other modern languages. For that to work, I tried replacing current usage of : and = with = and == . = is also used for function definitions which I replaced with ~ This is the result:

func add x to=1 ~ x + to

from employees
filter country == "USA"                           # Each line transforms the previous result.
derive [                                         # This adds columns / variables.
  gross_salary = add salary payroll_tax,
  gross_cost = add gross_salary benefits_cost    # Variables can use other variables.
]
join blah side=5 [blah_id]
group [title, country] (
    aggregate [                                  # `by` are the columns to group by.
        sum_gross_cost = sum gross_cost,
        count = count,
    ]
)

I know we have been over this multiple times, so I'll just throw it out there one last time: Current approach: : = func sum = ... Equals approach: = == func sum ~ ...

snth:

I'm definitely a fan of = for assignments in the derive clauses. The ~ for function definitions looks unfamiliar to me. How about using the ~ for the default values instead so you'd get something like:

func add x to~1 = x + to

I was curious if you really need the == since in the filter clause you could probably get away without it but I guess if you want to do > boolean tests within assignments then you'll need it:

derive [ is_zero = (divisor == 0) ]

max-sixty:

I am open to changes!

Personally, I don't find the ~ intuitive in func add x to=1 ~ x + to — I find splitting the function between the params and the body visually not easy. (This is also why I think Malloy's use of is isn't great — that split should be a specific piece of punctutation)

If we really have to, we could give up = as a comparison. Its advantage is that it's intuitive, and feels less programmy than ==. But it's not a religious issue.

If we want to add annotations, could we use something else than :? I don't think users will be using annotations much, so it's fine to make them ugly. We could use something like func<frame> though it's a similar-syntax-but-different-meaning to rust's use of <foo>.

To throw something else into the mix: the current syntax of using = in table foo = and func foo x = isn't completely coherent with the rest of the language; it would be more coherent to have table foo: and func foo x: — but that would conflict with named args for func.

So we really have 4 sorts of punctuation here:

  • : for named args; interp lower:0 1600 sat_score
  • : for assigns; derive temp_c: (temp_f | celsius_of_fahrenheit)
  • = for "fundamental assigns"; func interp lower:0 higher x = (x - lower) / (higher - lower)
  • annotations

aljazerzen:

I agree that func add x to=1 ~ x + to does provide nice visual split between params and the body.

For type annotations there are (a far as I know) only two standards: int a and a: int. So I would prefer to use one of those even if it's gonna be mainly used in our stdlib. When people lookup definition of, let's say, join transform, we can also use these annotations to convey the types.

I agree that table foo = and func foo x = are conceptually assigns, so to be coherent they should have same punctuation as assigns do. But I think that should be = instead of :.

It's just that function definitions then have ambiguous parsing.

What about func (add x to=1) = x + to ? This is similar to how you (often must) call a function.

Or func add x (to=1) = x + to?

I also agree that function definitions have to be easy to parse visually, which means that ~ and is are not good separators.

aljazerzen commented 2 years ago

Related issues:

snth commented 2 years ago

Thanks. Those issue numbers you linked to at the bottom were particularly helpful. I had long wondered where derive came from and whether let had been considered. I see there was discussion on that topic in #6 and #24.

max-sixty commented 2 years ago

Thanks for adding all the discussion @aljazerzen !

For type annotations there are (a far as I know) only two standards: int a and a: int.

At least Haskell uses :: and @. Java uses @, though maybe not the muse we're looking for.

Edit: Snowflake also uses ::

I'm not sure whether using :: would be confusing for the parser given how heavily we use : — my guess is that it would be fine, but a interactive tool might be confused when typing foo: before completing to foo::bar_type.

WDYT about that?

What about func (add x to=1) = x + to ? This is similar to how you (often must) call a function.

That could work; we could also have func add (x to=1) to separate the name from the params.

There was an elegance to not having punctuation, but it might be something that's more for people who find Ocaml elegant, rather than something intuitive for the average user. :) I'm also much less concerned about having the syntax clean for things like func definitions than for core pipeline code.

aljazerzen commented 2 years ago

Oh, with type annotations I meant specifying the type of variables/functions, and not actual annotations (@NotNull Integer myInt;). Java uses @ for annotations, but has not punctuation for types (in newer versions there is var a: Integer).

:: may work, but it may fail the visual check in function definitions:

func add x::number to::number:1 ::number = ...

or 

func (add x::number to::number=1)::number = ...
max-sixty commented 2 years ago

How about putting the return type at the start of the function, like in C, rather than an the end like in rust:

-func add x::number to::number:1 ::number = ...
+func::number add x::number to::number:1  = ...

(or the same operation for the other option)

aljazerzen commented 2 years ago

Maybe after the name of the function:

func add::number x::number to::number:1  = ...

To be honest that is the thing I dislike the most about the C-style syntax. Having just int foo(bool bar) is much less readable than having an additional func at the front.

But now thats just my personal preferences speaking.

Of all the options, I like this one the most:

func (equals x:number to:number=1):bool = ...

with parenthesis required, but all type annotations optional. That's because a call will look like (equals y to=4) and will have the type of bool.

max-sixty commented 2 years ago

Maybe after the name of the function:

func add::number x::number to::number:1  = ...

Yes, this would work too. One reason for the previous one is that it more clearly discriminates add as a name rather than a parameter. But no strong view; your option sounds good.

func (equals x:number to:number=1):bool = ...

I agree the parentheses have a nice equivalence with the function call. I'm ambivalent between this and putting the parentheses just around the args, to more clearly discriminate equals:

-func (equals x:number to:number=1):bool = ...
+func equals (x:number to:number=1):bool = ...

And to confirm, you're not a fan of <>, like:

func<bool> (equals x<number> to<number>=1) = ...

(or with the parentheses / return type moved around)

aljazerzen commented 2 years ago

No, <bool> seem too much as generic parameters.

It's true that function name should standout more, but that may be corrected by highlighting. In any case, I'm fine with both of these cases:

func (equals x:number to:number=1):bool = ...
func equals (x:number to:number=1):bool = ...
max-sixty commented 2 years ago

Great, me too — the implementor decides :)

Yeah, agree re it looking like generics, that is a downside

aljazerzen commented 2 years ago

For future readers:

In issues #444 and #447 we decided on the following: