eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
399 stars 62 forks source link

Invocation syntax for functions accepting single argument lambdas #7190

Open lucono opened 7 years ago

lucono commented 7 years ago

A very common scenario in many projects and libraries is one involving functions which accept a single argument, where the single argument is a zero- or single-argument function or lambda. In these cases, it would be useful to have a syntax which allows the body of the lambda directly in the function call's braces without needing to wrap the lambda with additional curly braces, improving readability quite a bit.

The syntax would need to be able to disambiguate this usage from other possible usages of the named argument function invocation syntax.

To invent a fictional implementation as an example, one might be similar to the invoke operator but where the opening curly brace of the named argument invocation syntax is immediately followed by a colon, resulting in a new function invocation operator which opens with {: as an indicator that the invocation braces directly contain the body (i.e. not wrapped in additional curly braces) of the single lambda argument being supplied.

{Integer*} ints = { 1, 2, 3 };

// 1.

ints.each((it) => print(it));      // today
ints.each {: print(it) };          // after - lambda statements directly in {: } invoke operator

// 2.

ints.map((it) {                    // today
    print(it);
    return it * 2;
});

ints.map {:                        // after - single param list ommitted
    print(it);
    return it * 2;
};

// 3.

ints.map((it) => it * 2);          // today
ints.map {: => it * 2 };           // after - with fat arrow for explicit return

Basically:


ints.sort((a, b) {          // today
    print("``a`` +");
    print(b);
    return a <=> b;
});

ints.sort {(a, b):          // after
    print(a);
    print(b);
    return a <=> b;
};
ghost commented 7 years ago

I really don't like this, it looks nice, but isn't at all regular. I especially dislike the "magical identifier" "it".

Additionally, what would the following print?

void foo(Boolean(Boolean)|Boolean() predicate)
{
    if(is Boolean(Boolean) predicate)
    {
        if(predicate(true))
        {
            print("1");
        }
        else
        {
            print("2");
        }
    }
    else
    {
        if(predicate())
        {
            print("3");
        }
        else
        {
            print("4");
        }
    }
}

shared void run()
{
    Boolean it = false;

    foo{: => it};
}
gavinking commented 7 years ago

I'm now in favor of adding a very limited form of this feature to 1.4, but by making it a keyword. Thus, the syntax would be:

ints.each(print(it));

The difference between my proposal, and what is proposed above, is this would be only for single-expression anonymous functions, and not for blocks.

gavinking commented 7 years ago

Oh, I see, there's an ambiguity here with respect to expression like foo(bar(baz(it)). Which of foo(), bar(), and baz() does it belong to? Still it should be possible to resolve that ambiguity during typechecking.

And I think requiring:

ints.each(=>print(it));

Would undermine most of what's nice about the feature.

ghost commented 7 years ago

I dislike this feature as you proposed, @gavinking, but there is a more disciplined version that I feel fits in more in the language.

Considering #5989, I feel like it could be the keyword for inferring the type of static references. This way, one would write what you wrote in #7215 as the following:

persons
    .filter((person) => 18 <= person.age)
    .sortedBy(it.name)
    .map(it.email)
    .forEach(print);

In addition, if there were an operator for function composition (I'll use "/compose/", in the lack of a good symbol), one could write the .filter call as .filter(18.notLongerThan /compose/ it.age). It's interesting to note that such operator would be useful in its own right, not only when used together with this feature.

The problem would be when calling functions, as in .forEach(it.setAge(20)), which would need to be written as .forEach(shuffle(it.setAge)(20)).

gavinking commented 7 years ago

@Zambonifofex I think what you're proposing is almost completely useless, since it doesn't reduce the token count from what I can already write today in Ceylon. And I don't see how, if it is so objectionable as a way to infer a parameter, it could possibly be unobjectionable as a way to refer to the type of that parameter.

Nor do I see how it's more "disciplined". It's merely less useful.

gavinking commented 7 years ago

I have pushed a prototype implementation to the 7190 branch, so folks who are interested can play with it. (There are surely bugs; I've spent about one hour working on this.)

ghost commented 7 years ago

@gavinking

it doesn't reduce the token count from what I can already write today in Ceylon

Types can be longer than a single token.

And if you don't like "it" as the name, you could use "type", for example. Either way, I felt the need for a feature like this several times in the past; I just feel like it would be more generally useful than what you're proposing.

But if you really don't like my suggestion, I could get used to your idea if you introduced some indication of which function the "it" actually belongs to. Since you don't like "=>", what about ":"?

persons
    .filter(:it.age >= 18)
    .sortedBy(:it.name)
    .map(:it.email)
    .forEach(:print(it));

And I also don't like the fact "it" isn't a keyword in your implementation. As I've said, I dislike "magical identifiers" like these; there is no indication that they are declared, which might hurt readability.

lucaswerkmeister commented 7 years ago

Replying to @gavinking’s comment on another issue:

persons
    .filter(it.age >= 18)
    .sortedBy(it.name)
    .map(it.email)
    .forEach(print(it))

Is far more readable than:

persons
    .filter((p) => p.age >= 18)
    .sortedBy((p) => p.name)
    .map((p) => p.email)
    .forEach(print)

Perhaps, but I think this is the most readable incarnation of the same code:

persons
    .filter((person) => person.age >= 18)
    .sortedBy(Person.name)
    .map(Person.email)
    .forEach(print)

In each line that does something person-related, you’re reminded that you’re operating on persons, not ps or its. This may be slightly more work to type, but code is more often read than written and so on.

Which of foo(), bar(), and baz() does it belong to? Still it should be possible to resolve that ambiguity during typechecking.

Yes, but we all know the typechecker is smarter than the programmer. Don’t force me to repeat the typechecker’s reasoning in my head to make sense of this code.

In the end, I personally don’t believe a further abbreviation syntax for this is necessary. We already have static references for th simplest cases (map(Person.email)), and if your lambda is more complicated, I personally don’t think a short (it) => is too much to ask.

gavinking commented 7 years ago

@lucaswerkmeister

Perhaps, but I think this is the most readable incarnation of the same code: ```ceylon persons .filter((person) => person.age >= 18) .sortedBy(Person.name) .map(Person.email) .forEach(print) ```

I agree that Person.name, Person.email is quite good. And it works as long as all I'm doing is using attributes. But the approach quickly breaks down as soon as I need to call a method of Person, and so in practice I've found it sometimes useful, but more often than not I do have to use the longer form with the (p) =>

This may be slightly more work to type, but code is more often read than written and so on.

I'm not motivated by a desire to avoid typing. I'm motivated by how nice I think the code for stream processing looks in some other languages that have this feature.

if your lambda is more complicated, I personally don’t think a short (it) => is too much to ask.

It doesn't have to be very complicated at all: it.longerThan(10), it.sequence(), it==' '.

gavinking commented 7 years ago

I dislike "magical identifiers" like these; there is no indication that they are declared, which might hurt readability.

Well the IDE would of course give a visual indication that it is special. But sure, on reflection, I agree. For consistency with this, super, outer, which also don't really need to be reserved words, it should probably be a keyword too. (Unless we made none of them reserved words.)

gavinking commented 7 years ago

I've pretty much finished the implementation of this. And after trying it out in my example apps, I'm sorry to conclude that I pretty much love it:

    value evenCubes =
            LongStream
                .iterate(1, it+1)
                .parallel()
                .map(it^3)
                .filter(2.divides)
                .mapToObj(it)
                .limit(20)
                .collect(toList<Integer>());
    value map =
            Stream
                .with(*text.split(it==','))
                .parallel()
                .map(it.trimmed)
                .filter(it.shorterThan(10))
                .collect(toMap(String.string, String.size));
    Stream
        .concat(Stream.with(*text.split()),
                Stream.iterate(1, 2.times))
        .parallel()
        .filter(switch (it)
            case (is String) it.longerThan(4)
            case (is Integer) it>4)
        .limit(10)
        .forEachOrdered(print);

Yes, I agree that it sits right on the border the class of "magical implicit stuff" that makes me rather suspicious, and that I've worked very, very hard to keep out of the language.

However:

  1. It's not implicit. The it keyword is absolutely explicit in its intent.
  2. Nor is it ever really ambiguous as far as I can tell. Sure, it borders on ambiguity with stuff like foo(bar(baz(it))), where the "scope" of it can only be resolved after first assigning a type to foo, bar, and baz. But that's also the case with regular identifier resolution.
  3. Finally, by so severely limiting this feature to (a) single-expression anonymous functions with (b) just one parameter, I've made it pretty abuse-proof.

So, we need to decide whether we should go for it and merge this work. Can we live withoutit? Definitely. We've been living without it for years now. Does it make the language nicer? It seems to me that it does. I like the look of the code with it. In particular, it means that Ceylon doesn't lack something that people like in other competing languages.

ghost commented 7 years ago

@gavinking

You've never said what you think about using ":" for indicating which function call it belongs to.

gavinking commented 7 years ago

You've never said what you think about using ":" for indicating which function call it belongs to.

Well, I think it's ugly. The only motivation for adding this feature is that it makes the code look cleaner and more visually pleasing. Plus, the choice of : would be totally adhoc here. There is no other context in the language in which : means "function".

gavinking commented 7 years ago

@Zambonifofex To be clear, we could use => instead, which would make much more sense, given that => already means "function" in the language:

    value evenCubes =
            LongStream
                .iterate(1, => it+1)
                .parallel()
                .map(=> it^3)
                .filter(2.divides)
                .mapToObj(=> it)
                .limit(20)
                .collect(toList<Integer>());

But, again, all of the value of this feature is in terms of reducing visual clutter. And => is still clutter.

jvasileff commented 7 years ago

I was going to agree, until I saw this abomination:

It seems to me that it does.

:)

More seriously, I think this is overall a good feature. There are downsides:

However, it's so concise and convenient, and as often as I have fun((val) => expressionInvolvingVal) in my code, I think it would be worth it. I'd probably say the same even if the only benefit were to save me from having to type (val) => which requires way too much use of the shift key.

gavinking commented 7 years ago

@jvasileff I think it's actually pretty nice that it.name boils down to the same thing as Person.name in the case where both are allowed. By which I mean, that this proposal is actually a strict superset of what @Zambonifofex proposed above.

jvasileff commented 7 years ago

My point is, I'd normally prefer:

persons
    .sortedBy(Person.name)
    .map(Person.email)
    .forEach(print)

but then, if we add the filter, for consistency I might prefer:

persons
    .filter(it.age >= 18)
    .sortedBy(it.name)
    .map(it.email)
    .forEach(print)

It's another decision (another stylistic choice). But overall, less important to me than the added convenience and reduced clutter.

gavinking commented 7 years ago

Alright, so the remaining wrinkle I have here is the following code:

print(["hello", "world"].each(print(it.filter(it!='l'))));

Currently this is accepted and does what is the right thing in this case. However, I'm uncomfortable with having it refer to two different values in the same expression. So I would prefer to outlaw that. What I'm slightly unsure of is what, precisely is the right way to express that restriction?

gavinking commented 7 years ago

P.S. I definitely don't propose to iintroduce:

outer.it

Or:

(it of filter)

:-P

lucono commented 7 years ago

but by making it a keyword ...

this would be only for single-expression anonymous functions, and not for blocks

I think this strikes a good balance for this feature to make a meaningful difference without introducing a lot of messy details.

I'm uncomfortable with having it refer to two different values in the same expression

How about:

While the above rules are probably on the more restrictive side, they would allow a lot of the simple, single expression use cases:

persons
    .filter(it.age >= 18)
    .sortedBy(it.name)
    .map(it.email)
    .forEach(print)

without creating much or any conflict for a potential future lifting of some of those restrictions, or expansion of the use of it.

So, for the original expression, which would be illegal under the above rules:

print(["hello", "world"].each(print(it.filter(it!='l'))));

It would have to be re-written to use it only in the outer each scope:

print(["hello", "world"].each(print(it.filter((str)=>str!='l'))));

Or use it only in the inner filter scope:

print(["hello", "world"].each(print((str)=>str.filter(it!='l'))));

But not both.

gavinking commented 7 years ago

@lucono yes, sorry, I misspoke when I said:

I'm uncomfortable with having it refer to two different values in the same expression.

Of course I'm fine with it referring to different values within nonoverlapping subexpressions. (Otherwise the feature would be pretty much useless.)

jvasileff commented 7 years ago

I'm turning a bit cold on this based on readability issues where it occurs more deeply inside an auto-lambda expression due to a) lack of clarity for what it refers to, and b) lack of obviousness that the outermost arg is auto lambdaized (lambdized?).

    [1,2,3].each(print([3,4,5].filter(it.equals)))

You must 1) scan far to the right to understand what's on the left, and vice-versa, and 2) more carefully consider types, which can already be confusing at times. Disallowing all but the most shallow uses of it would add to the awkwardness of this feature.

(My assumption is that once it is discovered, auto-lambda-ization occurs for the outermost possible argument.)

Another example (not the best, but hopefully communicates the idea) reads ok:

    persons.map((person) =>
        Div {
           person.children
                .filter(it.age >= 18)
                .map(Div(it.name))
        }
    )

and runs with the current code, but it's confusing to me why it works. I could have meant something like:

    persons.map((it) => ((person) =>
        Div {
           person.children
                .filter(it.age >= 18)
                .map(Div(it.name))
        }
    ))

more generally, if it (at any depth) affects the outermost argument to a Callable parameter (which seems reasonable), then this feature becomes unusable in complex builder expressions, which I think are an important part of Ceylon.

Finally, if it is allowed as a regular identifier, there is an ambiguity:

variable value it = 1;
function adder(Integer x)(Integer y) => x+y;
print([1].map(adder(it))); // may print either "{ 2 }" or "{ Integer(Integer) }"
gavinking commented 7 years ago

I'm turning a bit cold on this based on readability issues where it occurs more deeply inside an auto-lambda expression due to a) lack of clarity for what it refers to, and b) lack of obviousness that the outermost arg is auto lambdaized (lambdized?).

Yes, I agree. That's why the example you've just given is one of the things I'm proposing to disallow.

gavinking commented 7 years ago

Finally, if it is allowed as a regular identifier, there is an ambiguity:

We might need to warn against use of it as a regular identifier. However I don't want to make it an error b/c of all the existing code that would break.

jvasileff commented 7 years ago

Ok, perhaps I misunderstood your proposed restriction. But having itbecome unusable within arguments to invocations that occur within the auto-lambda would be strange, I think:

Disallowing all but the most shallow uses of it would add to the awkwardness of this feature.

gavinking commented 7 years ago

@jvasileff No, neither of the proposed fixes would do that:

  • Have all its bind to the outermost function that accepts a lambda, in which case the above code would be rejected because it!='l' is not a well-typed argument to filter()?
  • Initially accept nested implicit lambdas, but require that every it refer to the same value, in which case the above code would be rejected because it refers to two different things?
gavinking commented 7 years ago

To be clear, your example would be disallowed by the first option, allowed by the second option.

jvasileff commented 7 years ago

Note that we know (I assume) that it would be for the outermost argument to a parameter of type Callable, but the reader still has to find where that is.

My argument is that it would be strange to disallow my example (who care's how/where we use it in the expression), but it also creates readability challenges.

Have all its bind to the outermost function that accepts a lambda, in which case the above code would be rejected because it!='l' is not a well-typed argument to filter()?

I don't see how that would disallow:

[1,2,3].each(print([3,4,5].filter(it.equals)))

which is well typed? As in:

[1,2,3].each((Integer it) => print([3,4,5].filter(it.equals)))

FWIW, I'm not saying every will agree that this sort of thing is hard to read, just that it seems that way to me.

gavinking commented 7 years ago

I don't see how that would disallow:

Oh, OK, sorry, I did not read carefully enough. No, fine, I agree that your example would be accepted.

lucaswerkmeister commented 7 years ago

Finally, if it is allowed as a regular identifier, there is an ambiguity:

We might need to warn against use of it as a regular identifier. However I don't want to make it an error b/c of all the existing code that would break.

If we are to have this feature, I’d strongly advocate for making it a keyword. It removes ambiguity and makes the code slightly more readable in an environment with syntax highlighting (having it bold makes it clear that it’s not a regular identifier). (And IDEs are not the only syntax highlighters, so bolding it based on the typechecker’s analysis is not a full replacement for a keyword IMHO.)

On the other hand, I have to admit that at least in the SDK, there seems more code using it as identifier than I expected (as abbreviation for item or iterator). Perhaps we can deprecate it in 1.4 (2.0?) with the firm announcement that we will make it a keyword in 1.5 (2.1)?

gavinking commented 7 years ago

@lucaswerkmeister in the current implementation it is a keyword. However it's a keyword that is allowed in identifier locations for the purpose of backward compatibility.

lucaswerkmeister commented 7 years ago

Ah, that’s an interesting variation. I’m fine with that, then.

FroMage commented 7 years ago

I've never been a fan of things that don't express the laziness of computation visually, including things that look like expressions but are instead lazily evaluated, or in this case, wrapped into a lambda for later evaluation or no evaluation at all.

Now, in the name of following every other language in the race towards shorter more confusing programs, I guess I could say meh, why not follow others, but are we really talking about many languages here?

Is this something so important to add to justify adding yet more "there's more than one way to do it" to Ceylon?

Also, I've never understood the meaning of it, and assumed it was short for iterated which never made sense to me. I think the name is ill-chosen, but on the other hand if that's what everybody else is using it's going to be hard to suggest anything else, even though IMO value springs to mind.

luolong commented 7 years ago

Alright, so the remaining wrinkle I have here is the following code:

print(["hello", "world"].each(print(it.filter(it!='l'))));

Groovy has the same "problem". I think in some pathological cases, it can get kind of difficult to track (pun intended). But usually I have solved this by introducing a new name for the outer scope:

print(["hello", "world"].each(print((word) => word.filter(it!='l'))));
jvasileff commented 7 years ago

I've never been a fan of things that don't express the laziness of computation visually

To throw something out there, which I may actually wind up hating...

So, an option would be to autoclosurize expressions used to define single-element streams when assigned to single argument Callables using a named args invocation.

    persons
        .filter { it.age >= 18 }
        .sortedBy { it.name }
        .map { it.email }
        .forEach(print);

Of course, I'm just spitballing here. I haven't thought through how this would look for more complex examples.

p.s. I suppose persons.filter { true /* it.age >= 18 */ } could also be allowed, which would help when writing and troubleshooting code.

lucono commented 7 years ago

Groovy has the same "problem". I think in some pathological cases, it can get kind of difficult to track (pun intended). But usually I have solved this by introducing a new name for the outer scope:

print(["hello", "world"].each(print((word) => word.filter(it!='l'))));

This is similar to what I'm proposing here, but where you could instead choose to use it in any one of the overlapping lambdas in the expression and be forced by the compiler to use an explicit name ( (word) => ) in all the other overlapping lambdas in that expression.

This way, there could be no confusion as to which of the overlapping lambdas in the expression the it belongs to.

gavinking commented 7 years ago

where you could instead choose to use it in any one of the overlapping lambdas in the expression and be forced by the compiler to use an explicit name ( (word) => ) in all the other overlapping lambdas in that expression.

To be 100% clear, I agree that this is what we should do.

gavinking commented 7 years ago

@jvasileff

So, an option would be to autoclosurize expressions used to define single-element streams when assigned to single argument Callables using a named args invocation.

I don't hate it, honestly it looks quite reasonable. Let me think it over for a bit.

The only initial reactions I have are:

  1. It doesn't completely solve the issue of "what is the scope of it?", since I can still reasonably have nested {} within a lambda, however, I agree that in practice it makes it much less likely that you'll be confused.
  2. It makes named argument lists even more kitchen-sinky.
  3. It makes it a little harder to switch between the it and explicit forms (IDE can help).
  4. The use of { make it look like I could write a block of code in there, but I don't think we can possibly make that work, because it's syntactically ambiguous.
gavinking commented 7 years ago

So let me just throw out a different approach to reducing the clutter of short anonymous functions, one that takes advantage of something a bit special about how Ceylon declares functional parameters. Unlike other languages, we usually have not just a type, but also a name for each of the parameters of a functional parameter. So why do we force you to redeclare the name? We already let you leave the type off.

What if we let you write this:

persons
    .filter(=> element.age >= 18)
    .sort(=> x.lastName <=> y.lastName)
    .map(=> element.name -> element.email)
    .each(print)

Now, this doesn't read as nicely as it, but that's mainly because we didn't give the parameters very good names in the Iterable class, and we can fix that, leaving:

persons
    .filter(=> it.age >= 18)
    .sort(=> x.lastName <=> y.lastName)
    .map(=> it.name -> it.email)
    .each(print)

Or whatever.

Now, a problem that immediately arises with this is:

Where we don't have useful parameter names. But we could at least solve the unary case by just defaulting the parameter name to it.

Anyway, it's just an idea. I'm not sure I really like the look of it. But it's at least very intellectually consistent with the rest of the language.

lucaswerkmeister commented 7 years ago

Hm, interesting… not sure about the look either, but I like the idea behind it very much. It gives a use to those parameter names besides documentation, it’s consistent, it lets you provide better names than it (I think I actually prefer element over it :) ), and it supports functions with multiple parameters (great).

FroMage commented 7 years ago

Frankly, this is much more adequate in my opinion. I also prefer element over it.

Now, the big downside would be that it's not the same as <insert other language name here>. Another difference we have we'd have to justify. BUT this actually works with more than one param, no? If other languages don't support that, it could be a worthy justification for the difference.

gavinking commented 7 years ago

P.S. The => is absolutely needed in this proposal, because we don't have the it keyword as a way of recognizing the programmer really wanted a thunk. Earlier I argued against the clutter of =>, but with the it keyword it was doing very little real work, whereas here it is doing actual work. And of course it's a solution to the problem of not being able to immediately recognize the scope of the implicit thunk.

gavinking commented 7 years ago

BUT this actually works with more than one param, no?

Correct. That's the point.

gavinking commented 7 years ago

It gives a use to those parameter names besides documentation

Right. Well, they're not just used for documentation; they're also used by the IDE to generate nice anonymous function parameter names.

it lets you provide better names than it (I think I actually prefer element over it :)

Funny. I've always thought the it thing was super cutesy.

jvasileff commented 7 years ago

I'm going to be the contrarian here, take this as you will:

I think friendly identifier names are unimportant for this feature. The identifiers will normally be used immediately, as they are in the majority of the examples, and with obvious meaning.

And, in fact, "standard" names like it or $0 (I know there are problems with the latter) actually highlight that the identifier is implicit, so you don't have to read through a bunch of code to figure out which identifiers in an expression (or whatever) are the implicit ones.

So why do we force you to redeclare the name?

The names are not part of the type system, and are often different in method refinements.

Further, names chosen for documentation purposes which make sense for a particular function or method may be very confusing from the perspective of the caller's code.

gavinking commented 7 years ago

@jvasileff Well I agree with all that. All I would say about it is that it's always up to you to write code with understandable labels. If element makes the code less understandable, then add (person) or whatever, to make it clearer. That will always be easy.

gavinking commented 7 years ago

Anyway, it's just an idea. I'm not sure I really like the look of it.

Uff, I'm playing with this now, and it turns out I think I really like it. I think I like it much more than I predicted I would.

gavinking commented 7 years ago

I liked this so much, I've pushed my implementation to master.

gavinking commented 7 years ago

I'm now unreasonably fond of this feature, and frustrated I can't use it with block functions in this spark example:

get("/", => renderTodos(request));

post("/todos",
    (req, res) {
        todoDao.add(Todo(req.queryParams("todo-title")));
        return renderTodos(req);
    });
gavinking commented 7 years ago

Well, I guess you can write:

get("/", => renderTodos(request));

post("/todos",
    function {
        todoDao.add(Todo(request.queryParams("todo-title")));
        return renderTodos(request);
    });