Question: parsing operators and newlines

nikomatsakis commented 2 years ago

How to think about binary operators and newlines? Rust had the same issue to wrestle with and I suspect we want the same general answer. I'm referring to things like this:

fn foo() -> {
    if true { 1 } else { 2 }
    -5 # probably wants to return `-5`, not `-4`
}

fn foo() -> {
    a = if true { 1 } else { 2 }
    -5 # probably wants to return `-5` and set `a` to 1
}

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

fn foo() -> {
    a = if false { 1 } else { 2
    -5} # probably wants to set `a` to `-3`, but I'm not entirely sure ,especially since the next example...
}

async fn foo() -> {
    a = if false { 1 } else { print(2).await
    -5} # ...probably wants to print the number `2` and then set `a` to `-5`, and not try to subtract `5` from `()`
}

The rule I propose:

Binary operators cannot be preceded by a newline

So that you have to write b - \n 5 and not b \n - 5. That'd be a very simple rule.

Other rules I can imagine:

Statement-like expressions (e.g., if), when followed by a newline, do not accept binary operators.

But I'd rather not have to reason like that, it makes the grammar really complex.

_Originally posted by @nikomatsakis in https://github.com/dada-lang/dada/pull/129#discussion_r805140859_

nikomatsakis commented 2 years ago

cc @XFFXFF

nikomatsakis commented 2 years ago

Note that the rule i proposed would make this code:

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

set a to -5, and discard the if result.

Ah, I just remembered that I think I generally permitted newlines inside of vectors and things without a comma (I should write some tests for that...), so this would fit with that. e.g. this is legal dada right now (playground)

fn subtract(a, b) {
    a - b
}

fn main() {
    print(subtract(
        5
        3
    )).await #! OUTPUT 2
}

and hence:

fn subtract(a, b) {
    a - b
}

fn main() {
    print(subtract(
        5
        - 3
    )).await #! OUTPUT 8
}

nikomatsakis commented 2 years ago

My thinking was that we can just await the whole "trailing ," question altogether and use newlines. Not sure if that was a good idea. =)

brson commented 2 years ago

Given

Binary operators cannot be preceded by a newline

then

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

doesn't seem like it would parse, unless (Expr Expr) parses - is it going to? That would make blocks and parens, (Expr Expr) and {Expr Expr} .... the same?

Having the grammar be newline-sensitive sure doesn't appeal to me much - I didn't realize Rust did this. (edit: but now that I think about it this is probably the special rule about parsing control structures I always knew rust had but couldn't remember the details of).

This problem seems similar to the disambiguation of tuples and function calls in https://github.com/dada-lang/dada/issues/117, and could be solved the same way, where an opening paren in a function call can't be split onto a new line.

brson commented 2 years ago

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

Cases like this sure do look confusing.

The rules could be different inside { } vs inside ( ) or the compiler could lint against it inside ( ) in a way that would persuade people never to write such code.

brson commented 2 years ago

Another seeming solution to the binops case in particular would be to require binops to always be space-delimited, and unary ops not: 1 - 2 vs -2.

brson commented 2 years ago

Another newline-sensitive solution that might handle multiple cases in this issue is that for every sequence of expressions both newlines and commas act as separators, with the separators having precedence over continuing to parse the current expression.

nikomatsakis commented 2 years ago

@brson

Rust doesn't make the grammar newline sensitive, but it distinguishes uses of things like if ... { } else { } in "statement position" from elsewhere. It further requires that a "statement-like" if (etc) has () type. That's why this program doesn't type check.

doesn't seem like it would parse, unless (Expr Expr) parses - is it going to? That would make blocks and parens, (Expr Expr) and {Expr Expr} .... the same?

Good point, I think that I meant to have () behave differently with respect to newlines than other things.

nikomatsakis commented 2 years ago

I'll have to ponder the other suggestions. I also don't love whitespace or newline-sensitive grammars, but I think it's worth trying to not have ;. It leads to some interesting places. I would like to have the grammar be 'minimally' whitespace sensitive -- I think rules like 'cannot be separated by whitespace" (e.g., - 5 and -5 are not the same) or "cannot have a newline" are ok. I would not want more than that because I love the ability to have a "autoformat on save" just cleanup a bunch of gook I just wrote and having things line up correctly. When using Python a lot, I also found that it was easy for me to lose indentation when copy-pasting or at other times, and that could be quite confusing to debug.

brson commented 2 years ago

I think we might as well implement the rule you suggest, at least for now. I'd love to get the reference grammar and production grammar in agreement so they can be kept in sync forever. Parol has some ability to turn on and off newline sensitivity based on context, so I think it should be able to handle the rule.

Just one more thing to point out: it's been a long time since I read Code Complete but one bit that has stuck with me is the suggestion that splitting binops to a new line before the op reads better than splitting after the op. That is this:

let x = foo
    + bar
    - baz
    / qux

is easier to scan than

let x = foo +
    bar -
    baz /
    qux

and the proposed rule makes that formatting not possible.

nikomatsakis commented 2 years ago

Big +1 to getting ref / actual grammar in sync.

I did consider that the rule would mean you can't move operators to the start of the line. I thought it wasn't as popular for some reason, checking rustfmt suggestions it at least does move operators to the beginning (example).

One other consideration: requiring that binary operators be separated by whitespace would resolve the foo<T> vs foo < T ambiguity as well, right?

nikomatsakis commented 2 years ago

Another thought that I had:

Maybe if true { ... } else { ... } and friends should just always require parentheses if you plan to apply an operator to them? I feel like it's kind of hard to read anyway. Some examples:

fn foo() -> {
    if true { 1 } else { 2 } - 5
    (if true { 1 } else { 2 }) - 5
}

fn foo() -> {
    if true { 1 } else { 2 }.share
    (if true { 1 } else { 2 }).share
}

Not sure, the parens don't look great. Going to leave this comment for posterity's sake at least though. :)

dada-lang / dada

Question: parsing operators and newlines #134