erlang / eep

Erlang Enhancement Proposals
http://www.erlang.org/erlang-enhancement-proposals/
264 stars 67 forks source link

Add proposal for non-skipping generators #62

Closed dszoboszlay closed 2 weeks ago

dszoboszlay commented 4 months ago

This is an EEP accompanying erlang/otp#8625 that proposes the addition of a new, non-skipping variant of all existing generators (list, bit string and map). Currently existing generators are skipping: they ignore terms in the right-hand side expression that do not match the left-hand side pattern. Non-skipping generators on the other hand shall fail with exception badmatch.

josevalim commented 4 months ago

Hi @dszoboszlay! I understand the need for this proposal but unfortunately I find the operators quite confusing. Reviewing code that uses this:

[Z + Y || {X, Y} <-:- List, <<Z:16>> <=:= X]

would be hard to decompress what exactly it means. Especially if I use <- and <= in other places in the same module. Erlang already has operators for matching (and even "soft matching" with ?=) and introducing more operators to Erlang (which already has many) will make the learning curve harder.

Counter-proposal

I'd personally go with the "binders" version with a couple tweaks.

  1. Introduce a new syntax for matching binaries. Instead of:

    [X || <<X:8>> <= Binary]

    I propose:

    [X || <<X:8 <- Binary>>]

    And make it so it leaves no leftovers by default, which is arguably a better default. None of the other generators leave trailing elements, and it makes sense for the binary syntax to not do so either. If desired, an explicit notation for leftovers could be borrowed from typespecs, such as one of:

    [X || <<X:8, ... <- Binary>>]
    [X || <<X:8, _:_ <- Binary>>]

    I don't think they would be used frequently, so any of them is fine.

  2. Introduce binders for comprehensions, in the form of Pattern = Expr and Pattern ?= Expr. The first must match or it raises, the second skips if it doesn't match. None of them need to evaluate to a boolean. A downside is that Var = Expr is valid today but it is probably straight-forward to deprecate.

The reasons to consider this counter-proposal are:

  1. We unify all generators syntax, effectively reducing the number of operators in use in the language, and it fixes a gotcha in comprehensions related to binary generators.

  2. We make comprehensions and maybe expressions closer to each other. They are both forms of monads, being able to use Pattern = Expr and Pattern ?= Expr in both make Erlang more uniform.

Downsides:

  1. In order to do a non-skipping generator, you need to do [Z + Y || X <- List, {Y, Z} = X]. It is more verbose but the intent is at least clear to me, because the semantics of = are consistent everywhere in the language.

  2. We need to deprecate Var = Expr in the short term (to prepare for this feature) and deprecate <= in the long term. I think they are both good changes for the future though.

Edit: fixed a snippet based on @essen's feedback.

essen commented 4 months ago

Thanks for looking into this. Everyone had bugs caused by skipping at some point in their Erlang developer life.

Binders seem less of a mouthful than the other solutions.

[Y + X || X <- List, {Y, Z} = X]

Not sure what that means, you probably meant [Y + Z ...?

We need to deprecate Var = Expr in the short term (to prepare for this feature)

This should be deprecated regardless of this feature being implemented.

dszoboszlay commented 4 months ago

Hi and thanks for the feedback!

I find the operators quite confusing. Reviewing code that uses this:

[Z + Y || {X, Y} <-:- List, <<Z:16>> <=:= X]

would be hard to decompress what exactly it means.

With all due respect, I disagree with you on this one. I believe the operators may be confusing now because they're new, but they wouldn't be confusing once you've got used to them. Just like how #{X := Y} and #{X => Y} are crystal clear to all Erlang developers (I hope!), even though I found them rather confusing when I first started writing OTP 17 code.

[X || <<X:8, ... <- Binary>>] [...] We unify all generators syntax, effectively reducing the number of operators in use in the language, and it fixes a gotcha in comprehensions related to binary generators.

I don't see how this proposed syntax would unify all generator syntax? You'd still have [X || X <- List], [X || _ := X <- Map] and then [X || <<X:8 <- Binary>>] which is completely different. You'd eliminate the <= operator for sure, but at the cost of inventing more special syntax for bit string generators.

If you want to sanitize and unify generator syntax, I believe EEP 12 has a much better proposal with [X || X [<-] List], [X || <<X:8>> << <- >> Binary] and (I suppose) [X || _ := X #{<-} Map]. But that would be a completely different EEP than mine, which didn't get any traction for many-many years. Which is maybe a sign that the community wouldn't really like to develop comprehensions to this direction?

Introduce binders for comprehensions, in the form of Pattern = Expr and Pattern ?= Expr. The first must match or it raises, the second skips if it doesn't match. [...] We make comprehensions and maybe expressions closer to each other. They are both forms of monads, being able to use Pattern = Expr and Pattern ?= Expr in both make Erlang more uniform.

I disagree on this one. It's true that reusing the conditional match operator in comprehensions would avoid introducing a new operator, however it would give a very different semantic to the same operator in the two contexts. In a maybe expression:

If the matching succeeds, any unbound variable in the pattern becomes bound. If the expression is the last expression in the maybe block, it also returns the value of Expr2. If the matching is unsuccessful, the rest of the expressions in the maybe block are skipped and the return value of the maybe block is Expr2. [There are more rules about failed matching when the maybe expression have else clauses.]

While in case of comprehensions the documentation should probably read something like this:

A list comprehension returns a list, where the list elements are the result of evaluating Expr for each combination of generator elements for which all filters are true and all conditional matches succeed.

I see some analogy here, but it's far from making the language more uniform.

On the other hand I don't think introducing more operators for non-skipping generators would make Erlang significantly more complex. There are already many language elements exclusively used only in a comprehension context: ||, <- and <=. Adding <-:- and <=:= (or any other variation of the existing arrow operators) wouldn't be a big deal. If the new arrows look similar to the old arrows this could be quite intuitive too.

Furthermore, here's this:

Everyone had bugs caused by skipping at some point in their Erlang developer life.

This is very true, and I believe it's mostly because skipping isn't properly documented. If you read through the docs on comprehensions it doesn't even mention the word "skip" (I only found out this is the official term by reading the compiler source code, I intuitively first called the current generator behaviour filtering), or specify what happens when the generator's pattern doesn't match.

Now you may say that just documenting skipping would solve this problem. And it would indeed be a step in the right direction. But I'm afraid it won't be a solution. Because when there's only one way of doing something you do it that way, regardless of whether you understand all the edge cases or not. I think giving two different kind of generators to programmers would help them much more to keep in mind what's the difference between them. Because once you have to make a choice, you have to know what are you choosing between? It's like with == and =:=: because there are two operators, you know that you have to choose one and you will know that the difference is how they treat ints and floats.

I maybe wrong, but I strongly believe that for example in case of Javascipt people are aware of its horrible equality rules for two reasons:

  1. because they are comically bad
  2. but also because there's a == and a === operator, so you have to actively make a choice.

So I think having two flavours of generators is the best way to eliminate skipped element bugs. Better than educating everyone to always use match-all patterns in generators and bind in a follow up step.

josevalim commented 4 months ago

Just like how #{X := Y} and #{X => Y} are crystal clear to all Erlang developers (I hope!), even though I found them rather confusing when I first started writing OTP 17 code.

I guess your mileage may vary, because I still get them confused from time to time and I have the Erlang compiler tell me what to do (especially in patterns). :) The issue is that the operators do not tell me what they do, so I have to rely on my memory all the time.

If you want to sanitize and unify generator syntax, I believe EEP 12 has a much better proposal with [X || X [<-] List], [X || <> << <- >> Binary] and (I suppose) [X || _ := X #{<-} Map]

I agree this is better, in the sense that it is even clearer on what which operator does, I don't have to guess! However, it is quite verbose. The issue with today's comprehensions is precisely that I am meant to look at both [Y || <<Y>> <- X] and [Y || <<Y>> <= X] and understand they are different things, with no visual clue on what they actually mean.

By adding <-:- and <=:=, we are asking developers to know the difference between:

Those would all be valid and have well-defined runtime behaviour. This is, IMO, too much syntax variation without context.

TL;DR: this proposal uses one syntactical construct (operators) to control two things: the data type of the generator and if it skips or not, leading to too many variations. If bitstring generators could be unified to use <- instead (like maps do), we separate these concerns (regardless if you pick my proposal or EEP 12 or something else), and we would only ask developers to know the difference between 2 operators (<- and <-:-). But I'd still vote for not adding new operators if possible.

dszoboszlay commented 2 months ago

Updated this PR and the implementation with the proposed new operator syntax.

I've also posted the EEP on Erlang Forums, but my post is pending approval at the moment.

michalmuskala commented 2 months ago

I generally agree with what José said about this proposal.

I do not think the current "skipping" behaviour is surprising. For the languages that I checked that have both comprehensions and pattern matching (scala, haskell, elixir) the behaviour is the same as in Erlang. F# that I checked requires patterns to be complete and fails at compile-time with a "filtering" pattern.

In my opinion, what is surprising in the Erlang comprehension implementation (and different from others) is that a binding is treated implicitly as a filter expected to return a boolean value. Indeed, if we look at how scala or haskell expect the fallible comprehension to be handled it's to have a plain variable binding followed by a fallible match. Making bindings work normally in comprehensions would be my preferred way to solve this problem, as it not only solves this issue, but also just makes comprehensions more ergonomic in general - there's way more cases where binding new variables in a comprehension would be useful and would allow avoiding the "begin" nesting which is ugly and unergonomic.

The proposal does correctly point out that fixing the matching behaviour does not address the problem of binary comprehensions, and it indeed is something to consider, but I'm not convinced that it warrants making other types of comprehensions more complex and introducing additional, non-standard operators.

zuiderkwast commented 2 months ago

@michalmuskala Changing how variable binding works would be a breaking change though. Or are we still considering the pinning operator?

I'd prefer that we don't add these operators. They make the language more complex. The convenience of this feature is not worth the added complexity.

josevalim commented 2 months ago

@zuiderkwast IMO we would find a syntax that uses the same operator <- for traversing lists, maps, and binaries. It already does so for lists and maps, just binaries missing. Then we could use the opportunity to fix the trailing binaries issue?

michalmuskala commented 2 months ago

The pinning operator is actually a good context to discuss here as well, though I wasn't actually thinking about it. With this proposal what would be the semantics of:

X = 1
[X || X <:- List]

Would it be matching (like patterns generally do) or shadowing (like generators in comprehensions do). This is probably an important semantic question to clarify before this proposal is finalised.

With regular matching allowed, this question goes away since semantics of = should be the same everywhere - inside and outside comprehensions (meaning matching, not shadowing).

And yes, changing the meaning of = would be a breaking change - though perhaps we could think through minimising the impact of that. Fortunately, this doesn't seem like a particularly useful pattern today, so I'd expect the usage to be fairly low - and this is something that we should verify/confirm with some survey of available open-source code before this proposal could be properly put forward.

dszoboszlay commented 2 months ago

The pinning operator is actually a good context to discuss here as well, though I wasn't actually thinking about it. With this proposal what would be the semantics of:

X = 1
[X || X <:- List]

Would it be matching (like patterns generally do) or shadowing (like generators in comprehensions do). This is probably an important semantic question to clarify before this proposal is finalised.

Shadowing, like the <- generator does. I think it would be very-very confusing if one generator would shadow but the other not.