let expressions in comprehensions

chochos commented 12 years ago

Comprehensions are a really cool language feature and they would be even more useful if there was a way to declare and initialize values or variables that were internal to the comprehension. The keyword given can be used to enclose a declaration which can be used from that moment on:

given (i:=0) for (x in xs) something(x,i++)

for (x in xs) given (t=x*x+sqrt(x)+someOtherExpensiveCalculationWith(x)) for (y in ys) x%y

No need to state value or variable since it can be inferred from the declaration: = means value, := means variable. Local type inference is already done.

UPDATE: what we would really do, given all the evolution in the language since this was originally proposed, would be support the following:

let (c=Counter()) for (x in xs) something(x,c.next())

for (x in xs) let (t=x*x+sqrt(x)+someOtherExpensiveCalculationWith(x)) for (y in ys) x%y

gavinking commented 12 years ago

Rrrm: first big problem: Strings are Lists in Ceylon, and therefore Manys, which can't be right. We would have to rework everything so that Strings are no longer lists of characters: ugh!

gavinking commented 12 years ago

Ooops, note that in my comments, I'm not using Ivo's definition of Sequenced. I'm assuming that Sequenced is just some kind of "many" type. It's Iterable, and a Category.

RossTate commented 12 years ago

Perhaps I'm being dense, but I don't quite follow how aggregate() can't just be a function.

With Ivo's proposal, everything is just syntactic sugar for a function call. If you want to go with that, you can make it extensible. Declare a function as Return<B> foo<A,B>(Covariant<A> container, Callable<B,A> mapper) where Return is also covariant, then call it with foo (a in as) B_body_using_a. The type argument A will be inferred from the principal type of as, and the type argument B will be the principal type of B_body_using_a (assuming a has type A).

I have on several occasions tried really hard to come up with a way to simply eliminate sequenced parameters from the language, but always drawn a blank. As far as I can tell I totally need them in named argument lists, at the very least.

Can you recap your reasons for sequenced parameters really quick?

ikasiuk commented 12 years ago

Perhaps it's just me, but I really hate the asymmetry of if (x) y else z. I guess I might hate it less if else were renamed to : as in if (x) y:z but that's already getting cryptic.

Hehe, interesting: that's the exact opposite of how I think about this. It can't get much more regular and predictable than this:

"if" for statements: if (condition) { statements } else { statements }
"if" for expression: if (condition)   expression   else   expression;

ikasiuk commented 12 years ago

I have on several occasions tried really hard to come up with a way to simply eliminate sequenced parameters from the language, but always drawn a blank. As far as I can tell I totally need them in named argument lists, at the very least.

We could disallow sequenced arguments for positional invocation (but not for named argument syntax) and instead require the user to explicitly provide an Iterable. So you have to write f({1, 2}) instead of f(1, 2). That makes more sense than one might initially think. Note that we can omit the parentheses and write f{1, 2}: that's just named argument syntax. Given two functions

void f(Object... params) {...}
void g(Object param0, Object... params) {...}

that would allow us to write:

f();
f{1, 2};
f(for (i in 1..3) i*i);
f{ params = for (i in 1..3) i*i; };

g(0);
g(0, {1, 2});
g{ param0=0; 1, 2 };
g(0, for (i in 1..3) i*i);
g{ param0=0; params = for (i in 1..3) i*i; };

chochos commented 12 years ago

That doesn't look bad at all. IMO sequenced args were introduced in Java just to avoid to annoying verbosity of f(bla, new Object[]{ ble, bli, blo, blu }). In Ceylon it's just f(bla, { ble, bli, blo, blu}) anyway and if you look at that call, you don't even have to guess if part of that are sequenced args or not.

FroMage commented 12 years ago

I don't know, I kinda like sequenced params :(

Hey, why is @tombentley missing all the fun in this discussion?

RossTate commented 12 years ago

Could someone clarify why they're important for named arguments?

P.S. I'm not suggesting they're a bad feature; I'm just trying to understand the big picture.

ikasiuk commented 12 years ago

We could even go one step further and say that there are no sequenced parameters anymore. T... is just syntactic sugar for Iterable<T> and f{1, 2} is just syntactic sugar for f{param={1, 2};} where param is the last parameter of f.

That would also finally allow us to write

class C(values) { Object... values; }

And even this would work:

void f(Integer[] values) {}
f{1, 2, 3};

But... there is also a problem: how do you convert a comprehension to a sequence? {for(i in 1..3) i*i} would be a sequence with one Iterable element in this scenario. So you'd probably have to write something like sequence(for(i in 1..3) i*i) or (for(i in 1..3) i*i).sequence, or introduce some kind of syntactic sugar for that. In a way that's the inverse problem of the one we currently solve with the elements function.

So, this does seem like a good direction to explore but I'm not totally convinced yet.

ikasiuk commented 12 years ago

Perhaps I'm being dense, but I don't quite follow how aggregate() can't just be a function.

It was pointed out that with my proposal nested fors result in an Iterable of Iterables while they currently produce a single Iterable. So I tried to offer something that allows the same effect with a similarly simple syntax. But of course it's true that this could also be done with a function, though probably slightly more verbose. Indeed the use case may not be common enough to justify a separate operator. So I tend to agree that a function should be sufficient.

tombentley commented 12 years ago

Hey, why is @tombentley missing all the fun in this discussion?

This issue has seen a lot of ideas pass through, so I won't comment on all of them.

I find the possibility of dropping sequenced arguments in positional invocations interesting. Those extra braces don't look so awful. It would mean that when declaring a method you'd have to explicitly set the default to empty though.

Fat arrow just doesn't feel like it's very Ceylonic.

Although my initial reaction to having for and if expressions was 'Urgh! That looks awfully confusing', they are growing on me a little bit.

ikasiuk commented 12 years ago

So, I guess we want to provide syntax and semantics for argument lists etc. which are above all intuitive, regular and not too complicated. After all, people deal with this stuff more or less in every second line of code. It seems to me that this can be achieved better with a syntax-based solution than with a type-based solution. Unfortunately my previous suggestion doesn't really fulfill the purpose quite well enough IMO. So I'd like to suggest a different, somewhat cleaner syntax:

Functions can define sequenced parameters as usual.
Positional argument lists can define sequenced arguments as usual.
Two important changes to named argument lists:
- No sequenced arguments, i.e. you have to provide an Iterable directly. It is however allowed to omit the parameter name for the sequenced parameter (f{x=0, myList}).
- They are comma-separated, just like positional argument lists.
The for operator returns an Iterable and requires no special treatment.
Important change to sequence literals: assuming a function T[] makeSequence(T... ts) we define that
- [a, b, c] means makeSequence(a, b, c), and
- {iter} means makeSequence{iter}.

Here are some examples:

// Sequenced parameters work as expected
void f(Object... values) {}
void g(Object param0, Object... params) {}
f();
f(1, 2, 3);
f(myList...);
g(0);
g(0, 1, 2);
g(0, myList...);

// Named argument lists become a bit more simple
g { param0=0, params=[1, 2] };
f { values = myList };
f { myList }; // equivalent to the above

// New syntax for sequence literals
Integer[] seq = [1, 2, 3];
Empty e = [];

// Comprehensions are typically used with curly braces
f { for (i in 1..3) i*i };
Integer[] seq = { for (i in 1..3) i*i };
Iterable<Integer> = for (i in 1..3) i*i;
g { param0=0, for (i in 1..3) i*i };

The modified syntax for named argument lists is crucial here. So let's look at the examples from the Ceylon introduction, converted to the new syntax (and slightly extended):

value table = Table {
    title = "Squares",
    rows = 5,
    border = Border {
        padding = 2,
        weight = 1
    },
    object objectArg {/*...*/},
    void functionArg() {/*...*/},
    columns = [
        Column {
            heading = "x",
            width = 10,
            String content(Integer row) => row.string
        },
        Column {
            heading = "x**2",
            width = 10,
            String content(Integer row) => (row**2).string
        }
    ]
};

value tests = Suite (
    Test {
        name = "sqrt() function",
        void run() {
            assert(sqrt(1)==1);
            assert(sqrt(4)==2);
            assert(sqrt(9)==3);
        }
    },
    Test {
        name = "sqr() function",
        void run() {
            assert(sqr(1)==1);
            assert(sqr(2)==4);
            assert(sqr(3)==9);
        }
    }
);

This has a couple of advantages over the current syntax:

Currently two arguments in a named argument lists can be separated either by semicolon or by comma or by neither, and you have to understand the rules of the syntax to get it right. With the modified syntax it's extremely simple: all the arguments are comma-separated, just like in a positional argument list.
Named argument lists are visually a bit more distinct from statement blocks - they look more like argument lists. I think this enhances the readability of the source code.
Arguments to sequenced parameters are better separated from the rest of the named argument list. Although that's (slightly) more verbose it has two advantages:
- The separation makes sense in terms of code structure because the syntax for the sequenced arguments is actually different than the syntax for the other named arguments (because they don't have individual names). So we don't mix two kinds of elements with different syntax in the same block. That makes the code structure more regular.
- If a method or class initializer expects another sequence of things in one of the other parameters (e.g. f(Object[] s1, Object... s2)) then both sequences are specified in exactly the same way in a named argument list.

RossTate commented 12 years ago

So what about nested fors, since that was the original problem with for returning an Iterable? For example, for (i in nums) for (j in nums) if (i < j) Pair(i,j).

ikasiuk commented 12 years ago

So what about nested fors, since that was the original problem with for returning an Iterable? For example, for (i in nums) for (j in nums) if (i < j) Pair(i,j).

See the aggregate operator proposed here: https://github.com/ceylon/ceylon-spec/issues/377#issuecomment-7911131 and the response to Gavin's objection to it: https://github.com/ceylon/ceylon-spec/issues/377#issuecomment-7948583

gavinking commented 12 years ago

No sequenced arguments, i.e. you have to provide an Iterable directly. It is however allowed to omit the parameter name for the sequenced parameter (f{x=0, myList}).

I hate this idea. It seems to really undermine the idea that Ceylon's declarative mode is as good as structured languages like XML. How critical is this to your proposal?

They are comma-separated, just like positional argument lists.

Currently two arguments in a named argument lists can be separated either by semicolon or by comma or by neither, and you have to understand the rules of the syntax to get it right. With the modified syntax it's extremely simple: all the arguments are comma-separated, just like in a positional argument list.

I suppose this is not worse, though I'm having trouble warming to it...

The for operator returns an Iterable and requires no special treatment.

This would certainly be an improvement.

Important change to sequence literals: (snip)

This one I don't get. It seems a bit less regular to me, but perhaps I'm missing something. For example, it looks like I would write:

[x, y, z]

for a sequence of things, and

ArrayList(x,y,z)

for a list of things. That seems a lot worse than what we have today.

Named argument lists are visually a bit more distinct from statement blocks - they look more like argument lists. I think this enhances the readability of the source code.

I always considered it a design goal that named argument lists would look visually consistent with statement blocks. I'm definitely not sold on the idea that this was a bad idea.

Arguments to sequenced parameters are better separated from the rest of the named argument list. Although that's (slightly) more verbose it has two advantages:

Well, it certainly has advantages from our point of view, but for the usecase of declaring a user interface, or a build script, or whatever, it seems much more awkward to me.

ikasiuk commented 12 years ago

I'll try to explain the reasoning behind the proposed syntax. The goals are:

Sequenced arguments for a function f(Object...) should not be more complicated than just writing f(1, 2, 3),
On the other hand there should be a similarly convenient way to use comprehensions for sequenced parameters, like f { for(i in 1..3) i*i },
The same must be possible for sequence literals (i.e. using a list of values as well as using a comprehension) in a syntactically consistent way.

The named argument syntax plays an important role in this context. A named argument list is currently split into two parts in the following way:

f {
    <named arguments>
    <sequenced arguments>
};

The separation between the two parts is given only by the difference in the syntax of their respective elements. I must admit that I always found that a bit confusing, and it turns out that one way to reach the goals is actually to make the separation more explicit:

f {
    <named arguments>
    seqParam = [ <sequenced arguments> ]
};

where [1, 2, 3] is a sequence literal, see below. Allowing to omit the seqParam= gives us

f {
    <named arguments>
    [ <sequenced arguments> ]
};

and consequentially

f {
    <named arguments>
    <iterable>
};

so that we can automatically write the desired f { for(i in 1..3) i*i }. Now we just have to make sure that the syntax for sequence literals is symmetric to that of parameter lists:

f( <comma-separated list> );
f { <iterable> };

value seq1 = [ <comma-separated list> ];
value seq2 = { <iterable> };

so that we can write:

f(1, 2, 3);
f { for(i in 1..3) i*i };

value seq1 = [1, 2, 3];
value seq2 = { for(i in 1..3) i*i };

Note that [1, 2, 3] replaces the current {1, 2, 3} to avoid ambiguity. I actually find it quite fitting.

That's basically how I arrived at that syntax. @gavinking, you asked a question concerning sequence literals which I didn't quite understand. Does that clarify it a bit?

Well, it certainly has advantages from our point of view, but for the usecase of declaring a user interface, or a build script, or whatever, it seems much more awkward to me.

I guess any more explicit separation between the named argument part and the sequenced argument part of a named argument list could achieve the same effect. But neither do I have a better idea at the moment, nor am I conviced yet that this is really a problem. Can you give a code example of where you would find this awkward?

I always considered it a design goal that named argument lists would look visually consistent with statement blocks. I'm definitely not sold on the idea that this was a bad idea.

It's surely possible not to use commas and use semicolons instead, as with the current syntax. But it looks like I must make a confession: Since I've first seen it I've always found the named argument syntax in its current form the only part of the Ceylon syntax that feels somewhat awkward and unintuitive. The idea as such is absolutely great but I find it surprisingly hard to read and write Ceylon code that uses named arguments in a non-trivial way.

The problem is that when I look at such a piece of code then the structure is not obvious: is this part a normal code block or a named argument list? What function does that line belong to? What is this function nested in? Why is this } followed by a comma but not that one over there?

That makes me whish that parts of the code that serve a different purpose also look different and are clearly separated from each other - that the syntax helps me to intuitively understand the structure of the program. Of course that's only my personal impression and so perhaps it's just my fault that I don't "get it".

gavinking commented 12 years ago

It's surely possible not to use commas and use semicolons instead, as with the current syntax. But it looks like I must make a confession: Since I've first seen it I've always found the named argument syntax in its current form the only part of the Ceylon syntax that feels somewhat awkward and unintuitive. The idea as such is absolutely great but I find it surprisingly hard to read and write Ceylon code that uses named arguments in a non-trivial way.

The problem is that when I look at such a piece of code then the structure is not obvious: is this part a normal code block or a named argument list? What function does that line belong to? What is this function nested in? Why is this } followed by a comma but not that one over there?

Well, I certainly don't love the commas in the sequenced parameter list either. But I'm not sure that your solution really improves the situation very much. You swap:

Html { Head { title="title"; }, Body { ... } }

For

Html { [ Head { title="title"; }, Body { ... } ] }

It's like the same shit just inside a rectange, no?

ikasiuk commented 12 years ago

Oh, I don't mind the commas, I just think the syntax could be a bit more consistent in this respect: that's why I suggested to simply use commas everywhere in argument lists. But we are wandering off the subject a bit. We were trying to make sequenced parameters and generalized comprehensions work together properly. And I think I finally found a good solution to that:

Let's go back to the Sequenced type, with the following definition:

shared abstract class Void() of Object|Sequenced<Void>|Nothing {}
shared class Sequenced<out Element>(elements) extends Void() {
    shared Iterable<Element> elements;
}
shared interface Iterable<out Element> given Element satisfies Object? {...}

This could be used with the following rules:

The static type of an argument x to a sequenced parameter p must be assignable either to Object? or to Sequenced<Void>.
If x is Sequenced then p=x.elements else p={x}.
x... means Sequenced(x) for any Iterable x .
The for operator returns a Sequenced.
The RHS of the for operator behaves like a sequenced parameter (i.e. each iteration can contribute several values if the RHS is a Sequenced).

Thanks to the first rule it can always be decided at compile time whether the argument is a sequenced argument. It might even be a good idea to extend the first rule to "The static type of any argument x must be assignable either to Object? or to Sequenced<Void>" to avoid surprises with values of type Void.

With these rules the following examples all work as expected:

void f(Object... values) {}
void g(Object obj, Object... values) {}
f(1, 2, 3);
f(myIterable...);
f(for(i in 1..3) i*i);
f { 1, 2, 3 };
f { for(i in 1..3) i*i };
g(0, 1, 2);
g(0, for(i in 1..3) i*i);
g { obj=0; 1, 2 };
g { obj=0; for(i in 1..3) i*i };

Integer[] seq1 = { 1, 2, 3 };
Integer[] seq2 = { for(i in 1..3) i*i };
Iterable<Integer> it = elements { for(i in 1..3) i*i };

value s1 = { for(i in 1..3) i*i }; // {1, 4, 9}
value s2 = { for(i in 1..2) for(j in 1..2) i*10+j }; // {11, 12, 21, 22}
value s3 = { for(i in 1..2) { for(j in 1..2) i*10+j } }; // {{11, 12}, {21, 22}}

Sequenced values can also be used directly:

Sequenced<Integer> x = for(i in 1..3) i*i;
f(x); // sequenced argument
g(0, x); // sequenced argument
print({x});
Iterable<Integer> it = x.elements;

But f(x) is not allowed if x is of type Void. In that case the type has to be narrowed first.

An interesting possible extension is to apply the p=x.elements conversion not just to sequenced arguments but to any argument with type Iterable:

void h(Iterable<Integer> ints, String... strs) {}
h(for(i in 1..3) i*i, "cat", "dog");
h {
    ints = for(i in 1..3) i*i;
    strs = for(i in 1..3) "nr "i"";
};

This is always unambiguous because of the first rule and the definition of Iterable.

RossTate commented 12 years ago

Here's something that doesn't work with that:

Iterable<B> map<A,B>(B to(A a))(Iterable<A> as) {
  return {for (a in as) to(a)};
}

It doesn't even type check according to your proposal.

Note that this also doesn't work in any proposal saying that for (...) e should skip e if e is null. With those it'll type check, but it won't do what people want it to do.

ikasiuk commented 12 years ago

It doesn't even type check according to your proposal.

You mean because of the constraint of the type parameter of Iterable? Well, that can easily be solved with a given clause. It's not great that it's necessary to specify a type constraint, but that's not a new problem: it's not uncommon that you have to specify something like given T satisfies Object to satisfy the type constraints of a parameter or return type.

Note that this also doesn't work in any proposal saying that for (...) e should skip e if e is null. With those it'll type check, but it won't do what people want it to do.

That depends on how the behavior of that map function is defined. It's true that it skips resulting null values if it is implemented with a comprehension. But that's not necessarily wrong. I think you'd have three options: change the return type to Iterable<B&Object>, restrict type parameter B to Object or choose a different, null-preserving implementation.

RossTate commented 12 years ago

So say I'm a programmer who has heard about Ceylon and particularly that it has cool features like generics, first-class functions, and list comprehensions. map might be one of the first programs I'd write to play with these features. I have a Java, C#, Scala, C++, ML, or Haskell background I would most likely try implementing map with the above code. Your two solutions mean either I would immediately be faced with the subtleties of Ceylon's type system (i.e. "Why do I have to explicitly declare a type parameter is a subtype of Object?? Isn't that obvious?!") or I would unknowingly be writing code that doesn't actually work the way I want it to. The latter is especially frightening, and gets worse when you consider that fact that even once I noticed it doesn't work correctly I would have absolutely no idea why since there's nothing in the code at all that reveals this behavior with nulls (even ones not visible in the code's types). So, if Ceylon is supposed to be easy to transfer to, this certainly seems contradictory to that goal, which is why I'm pushing for a solution where I can write map as above and it works how programmers already familiar with map from some other language would expect.

ikasiuk commented 12 years ago

[...] either I would immediately be faced with the subtleties of Ceylon's type system (i.e. "Why do I have to explicitly declare a type parameter is a subtype of Object?? Isn't that obvious?!") [...]

Yes. But as I said: that's a separate problem which is not specific to this particular situation. It's something that can generally occur in Ceylon wherever you use generic types.

[...] or I would unknowingly be writing code that doesn't actually work the way I want it to. The latter is especially frightening, and gets worse when you consider that fact that even once I noticed it doesn't work correctly I would have absolutely no idea why since there's nothing in the code at all that reveals this behavior with nulls (even ones not visible in the code's types).

Not sure if I agree completely, but maybe you are right. The solution is obvious: the RHS type of the for operator must be assignable to Object. I guess that's a reasonable restriction. In your example that would force you to either restrict B to type Object or to insert a check if (exists b=to(a)).

RossTate commented 12 years ago

Or make it so that type variables by default can only represent class types.

ceylon / ceylon-spec

let expressions in comprehensions #377