elm / compiler

Compiler for Elm, a functional language for reliable webapps.
https://elm-lang.org/
BSD 3-Clause "New" or "Revised" License
7.53k stars 663 forks source link

Proposal: lamba-bind with = not -> #981

Closed mgold closed 9 years ago

mgold commented 9 years ago

Since we seem to be making syntax proposals (#978, #979) I thought I'd throw this into the ring. Although I think it is a mostly orthogonal change, it would certainly be best to group all breaking syntax changes into a single release.

My primary motivation is consistency, which has a practical benefit. A few months ago, my employer unexpectedly introduced Clojure and many of my coworkers (Rails devs) were struggling. One piece of advice I found very helpful was to say, "open paren means function application". (Let's not get pedantic about the threading macro.) Having a consistent syntactic construct seems, empirically (though hardly scientifically) to help people work their way into a functional language. It's important that the construct be a bijection: function application occurs if and only if you see an open paren.

With that in mind, I present the five ways of binding a value to a name in Elm 0.15:

-- top level definitions
double x = 2*x

-- records
aRecord = {x = 4}
anotherRecord = {aRecord | x2 = aRecord.x * 2}

-- let
double' x =
    let doubled = x*2
    in doubled

-- union types and type aliases
-- we're binding a type to a capitalized name of a type, so this is still consistent
type Foo = Bar | Qux
type alias Point = { x : Float, y : Float }

-- lambda
double'' = \x -> 2*x

Do you see the problem? Lambda bindings are unique in that they use -> instead of =. This means that I can't truthfully say, "equals sign means the name on the left is being bound to the value on the right". Yes, this is true for lambdas, provided we change the syntax: the function that takes x, \x, is being bound to x*2, and x itself is being bound to whatever you pass the function.

Furthermore, I also can't say "right arrow means case analysis". Both case and multi-way if use the -> arrow, and that's good. (I support removing the pipes from the multi-way if to match case, but that is covered in other proposals.) Because of lambda bindings, the -> symbol is used in a context completely devoid of case analysis. (It's good that there's no -> in the single-way if; case analysis implies multiple cases! With lambda, it's possible to do unsafe case analysis like \(Just x) -> x but we shouldn't encourage this. The only legitimate use for unwrapping a union type in an argument list is when there is only one tag.)

Incidentally, Elm is really good about using : to mean only "has type", in annotations (top-level and let-bound) and in record type aliases. I think many JS programmers will gravitate towards writing record terms with colons, JSON style. One way to head that off is to use colon to mean has-type (done), and to have a clear symbol that means binding instead (this proposal).

To be explicit, the proposed change is to write \x y z = code rather than \x y z -> code.

I think implementing this will be a simple change to the parser, perhaps here. The larger task will be all the code that breaks in the examples, docs, and third-party libs. One possibility is to accept either \x -> 2*x or \x = 2*x in a 0.15.x release, but provide a deprecation notice on the previous one.

Aside from the difficulty of switching, there doesn't seem to be a reason to stick with the -> syntax other than a hold-over from Haskell (and maybe ML, I'd have to check). I'm interested in what you have to think about this proposal.

maxsnew commented 9 years ago

I really don't think it should be an =. In all other uses of the = we have

syntax ... NAME args ... = meaning ...

Where we are defining what NAME means. If we try to fit with your lambda we get that in

\ x y z = code

\ is the syntax, x is the name and y and z are arguments to x.

If anything the multi-way if is the odd-man out here since it is the only use of -> that doesn't bind variables on its left.

mgold commented 9 years ago

That's an interesting perspective, @maxsnew. By "syntax", I assume you mean a record's { or type alias or similar. There's no syntax for top-level definitions, though. So if we treat syntax as optional, you could consider \ to be the name. Or, \ is the syntax, there is no name (because it's anonymous), and it goes straight to the argument list. If syntax can be optional, so can a name. If you read \ x y z = code as "the function that takes x, y, and z is defined to be code", it makes sense.

As to your multi-way if point, you've hit on two subtly different meanings of "binding":

There's a considerable overlap between those lists, implying that distinguishing between them isn't helpful. Lambdas are not the only construct part of only one list. Moreover, it seems kind of weird to use -> for the second kind of binding and = for the first:

k = 4
double x -> 2*x

type Model = Model Int String
type Tree a -> Leaf a | Node (Tree a) (Tree a)

The second form of binding separates constants from functions, but in Elm (and Haskell and ML) we think of constants as fully-applied functions, perhaps degenerate in some way, but still the same sort of thing.

Instead, the difference I'd like to highlight it whether a language construct does case analysis. All of the constructs I listed above are incapable of case analysis, as is the single-way if. (Again, matching on a union type tag in an argument list is not considered case analysis since it cannot be done safely if there is more than one tag.)

The only constructs in Elm capable of case analysis are case and multiway if (which I'll call MWI). Neither of these define a name visible from the outside (the first type of binding). The MWI clearly does not bind arguments (the second type). I argue that a case branch does not bind arguments either; it performs pattern matching on a known type rather than accepts them as a variable. It's possible to pattern match without binding arguments, for example to a number literal or to a union tag with no parameters. Furthermore, calling a function always means performing an argument binding, but invoking a case statement does not guarantee that a particular pattern match will be used. So I don't consider case or the MWI to be binding constructs in either sense of binding, as outlined above.

In conclusion, lambda is a binding construct. Claiming it is not leads to the = and -> confusion I demonstrated above. The MWI and case are not binding constructs, but rather are case analysis constructs. It is therefore fitting to write lambdas with = so that all (and only) binding constructs use = and all (and only) case analysis constructs use ->.

JoeyEremondi commented 9 years ago

Yeah, I'm with @maxsnew on this one. To me, the essence of a lambda expression is that it isn't bound to anything, it's its own object which can be passed around and manipulated. Recursion is happening at the let-level, not at the lambda level.

For functions, the actual binding doesn't happen at the Lambda construction site, but at the call site. That's when the environment changes. So to me, having = would confuse newcomers, making them think that something was being assigned or bound when a Lambda is created, which it isn't.

On top of that, I like the intuition -> gives of mapping a value to another value.

mgold commented 9 years ago

For functions, the actual binding doesn't happen at the Lambda construction site, but at the call site.

This is true of top-level and let-bound functions as well.

On top of that, I like the intuition -> gives of mapping a value to another value.

Again, named functions do this as well.

For all the talk of binding an environments, I don't think that's how newcomers will think. "Here's a definition" or "here's where a function is being created" are different from "here's a case analysis expression".

dobesv commented 9 years ago

The -> isn't just a hold-over from Haskell/ML but actually it's an ASCII approximation of the function arrow; x ↦ x*2 is mathematical notation for a lambda.

The use of the leading backslash probably comes from the lambda calculus, where lambda expressions are written like λx.xx, but using \ because ASCII lacks the lambda letter.

Somehow the two have been merged here, probably because they wanted to use . for function composition or something.

The same ASCII arrow is used to represent the function type arrow which is a slighly different symbol but the two are distinguished in Haskell/Elm/ML based on context (types and values have distinct namespaces / operators).

So in a way the use of the arrow here is following mathematical notation.

In mathematical notation it is common to write "named" functions in the form f(x) = x*x which is where you can see the equals sign.

So, I think the current syntax originates in mathematical notation and it is, in that sense, consistent.

I'm not sure about the multi-way if and case uses of the arrow, however.

dobesv commented 9 years ago

That said I see your point about = being possibly more consistent with other binding constructs. However, I'm not sure it's a significant improvement over the consistency with mathematics, and consistency with other languages is worth something, too. I'd probably vote in favor of the status quo.

mgold commented 9 years ago

Closing due to opposition (which is not entirely unpersuasive).