Sequence comprehensions

CeylonMigrationBot commented 12 years ago

[@gavinking] So, it appears to me that we've moved away from the idea of Smalltalk-style arguments for invocation of higher-order functions, and that everyone seems to want a much more conventional syntax for anonymous functions inside positional argument lists (see #3160). I can live with that. But if that's the case, then I'm going to have to insist that we at least need a specialized syntax for comprehensions. There's many many possible variations of this syntax, but the two possibilities that I think would make most sense are the following:

{ (x**2+y**2)**0.5: x in 0..100 && x%10==0, y in -10..+10 && y%2==0 }

or

{ (x**2+y**2)**0.5: x in 0..100, x%10==0; y in -10..+10, y%2==0 }

I don't have an especially strong preference between these two options. I think the first is more consistent with the rest of our notation, and the second is perhaps slightly easier on the eyes. The second option is also a bit better from a grammar point of view.

Of course, there is also an argument for something like the following syntax:

{ (x**2+y**2)**0.5 for (x in 0..100) if (x%10==0) for (y in -10..+10) if (y%2==0) }

But in this case, I don't think the extra verbosity contributes to readability.

Note that since this is probably something I can desugar in the frontend, we could probably have this feature almost immediately.

Thoughts?

UPDATE:

Note that this syntax would be allowed to appear in two places:

where sequence enumerations appear, which would instantiate an ArraySequence, and
in place of a named argument list, producing an Iterable (or perhaps some kind of lazy sequence).

So you would be able to write:

HashMap { p.name->p.address: p in people }

[Migrated from ceylon/ceylon-spec#82] [Closed at 2012-05-04 01:51:59]

CeylonMigrationBot commented 12 years ago

[@gavinking] For completeness, here's a simpler example:

{ p.firstName + " " + p.lastName: p in people }

CeylonMigrationBot commented 12 years ago

[@FroMage] Well, I had no clue what your two first examples were about and even the third one got me confused with two for and if and no delimiter.

In fact I only managed to understand it given JavaScript's syntax on the Wikipedia page:

var s = [2*i for (i in range(0,100)) if (i*i>3)]

(even though I'm not a fan of this syntax). So it looks like this means:

(0..100).filter(function(Natural i){ i * i > 3; }).map(function(Natural i){ 2*i; });

Not sure we need this sugar, but if we want it, it has to be clear what it does.

CeylonMigrationBot commented 12 years ago

[@gavinking] Stef this is a really common notation that comes originally from mathematics. Popular languages with this feature include Python, Haskell, OCaml, and Erlang. I was always a huge fan of it in Python. C# has a related feature but they try to make it look more like SQL.

CeylonMigrationBot commented 12 years ago

[@gavinking] Note that this syntax would be allowed to appear in two places:

where sequence enumerations appear, which would instantiate an ArraySequence, and
in place of a named argument list, producing an Iterable (or perhaps some kind of lazy sequence).

So you would be able to write:

HashMap { p.name->p.address: p in people }

CeylonMigrationBot commented 12 years ago

[@FroMage] I can agree that Python is popular, but not the rest ;)

Besides I've never had the use for this, ever. But I don't do Math programs, so I can accept that it's useful, but I'm not really not sure about the syntax you propose. Not that I have much better myself to propose.

CeylonMigrationBot commented 12 years ago

[@FroMage] HashMap { p.name->p.address: p in people }

I've no idea what that means. Intuitively it would pass two parameters named p.name and p.address to the constructor of HashMap but that can't be it.

CeylonMigrationBot commented 12 years ago

[@gavinking] @FroMage trust me, if you had it, you would use it all the time.

Note that in degenerate cases it's an abbreviation for filter() or map(). This is filter():

{ p: p in people, p.age>18 }

This is map():

{ p.name: p in people }

To my eyes, these are much better than:

people.filter(function (Person p) p.age>18)
people.map(Person.name)

CeylonMigrationBot commented 12 years ago

[@gavinking] @FroMage, this is how we would declare HashMap:

class HashMap<Key,Item>(Entry<Key,Item>... entries) { .... }

So, using syntax we have today, you would be able to write:

HashMap { "Him"->25, "Her"->28 }

This proposal would extend this syntax to let you use a comprehension instead of an explicit list of entries. So then

HashMap { p.name->p.address: p in people }

means "a hash map with an entry mapping name to address for each person".

CeylonMigrationBot commented 12 years ago

[@FroMage] Yes I know it's a filter/map but I think it's a matter of personal taste which is more readable. In my opinion { p: p in people, p.age>18 } is a big WTF?

CeylonMigrationBot commented 12 years ago

[@gavinking] @FroMage I admit that I look at this notation with mathematician eyes. :-) Still, I understand it to be a beloved feature of Python, and I don't think of Python folks as being especially mathematically inclined.

CeylonMigrationBot commented 12 years ago

[@FroMage] HashMap { "Him"->25, "Her"->28 }

Isn't that HashMap ( "Him"->25, "Her"->28 ) ? I don't see why the named-argument syntax.

Well you know my opinion of hard-to-spot lambdas, so I guess this is probably my biggest issue here as with lambdas.

{ p.name->p.address for p in people where p.age > 18} would be a tiny bit more readable to me but still not convinced. We should definitely ask other people.

CeylonMigrationBot commented 12 years ago

[@FroMage]

Still, I understand it to be a beloved feature of Python, and I don't think of Python folks as being especially mathematically inclined.

No, but remember that Python has a widely different syntax than Ceylon.

CeylonMigrationBot commented 12 years ago

[@gavinking]

I don't see why the named-argument syntax.

You can write it either way. Both positional and named argument lists have sequenced arguments.

{ p.name->p.address for p in people where p.age > 18} would be a tiny bit more readable to me but still not convinced. We should definitely ask other people.

It would be if, not where, to save introducing an extra keyword.

CeylonMigrationBot commented 12 years ago

[@FroMage]

You can write it either way. Both positional and named argument lists have sequenced arguments.

Ah, good, because the {} here were confusing the hell out of me.

if is fine for me.

CeylonMigrationBot commented 12 years ago

[@FroMage] Though I still think this is the procedural equivalent of filter/map and not the other way around. In terms of it being sugar around a more primitive feature.

CeylonMigrationBot commented 12 years ago

[@gavinking]

In terms of it being sugar around a more primitive feature.

Yes, it is certainly a sugar around anonymous functions. And we would probably desugar it to that in the compiler.

CeylonMigrationBot commented 12 years ago

[@quintesse] I must say I agree this is a bit hard to spot { p: p in people, p.age>18 }. If you miss the colon you might just mistake it for the separation between two arguments.

Which brings me to: HashMap { "Him"->25, "Her"->28 } , I knew named arguments also had sequenced arguments, but what I find surprising is that we keep the commas, so this is the proper syntax: Foo { aap=1; noot;=2; "mies", "wim", "zus" } ?

CeylonMigrationBot commented 12 years ago

[@gavinking]

I knew named arguments also had sequenced arguments, but what I find surprising is that we keep the commas

There was a good reason for this that I don't quite remember right now.

CeylonMigrationBot commented 12 years ago

[@gavinking]

If you miss the colon you might just mistake it for the separation between two arguments.

I probably made a mistake with my formatting. I think it would be more conventional, and visually easier to parse, if the standard was to add an extra space:

{ p : p in people, p.age>18 }

Surprisingly, with just that little change, I think it stands out a lot more.

CeylonMigrationBot commented 12 years ago

[@gavinking] Interestingly, this would also maybe let us remove the ugly "spread" syntax ... for passing a sequence to a sequenced parameter. If we just let you use a comprehension everywhere we currently let you list the values of a sequenced parameter, then instead of:

print(words...)

which is somewhat cryptic, we could let you write:

print(w : w in words);

which to me is nicer even if more verbose.

CeylonMigrationBot commented 12 years ago

[@FroMage] It's not obvious to me that the second syntax would pass every param in the varargs rather than a list of values as the first element of the vararg array.

CeylonMigrationBot commented 12 years ago

[@quintesse] Exactly, better yet, if you want both options to be possibly you'd need a way of marking that difference and in comes "..." again...

CeylonMigrationBot commented 12 years ago

[@gavinking] So here's a strong argument for going with Stef's favored { p for (p in person) } syntax. Consider the following examples:

Table table { 
    title = "People"; 
    Column { 
        header="name"; 
        Cell { p.name }
            : p in people
    },
    Column { 
        header="age"; 
        Cell { p.age }
            : p in people
    }
}

vs:

Table table { 
    title = "People"; 
    Column { 
        header="name"; 
        Cell { p.name } 
            for (p in people)
    },
    Column { 
        header="age"; 
        Cell { p.age }
            for (p in people)
    }
}

I think the second one is significantly better. But then the following variation would definitely be in play:

{ for (p in people) if (p.age>18) p.name }

Which for this example would look like:

Table table { 
    title = "People"; 
    Column { 
        header="name"; 
        for (p in people) Cell { p.name }
    },
    Column { 
        header="age"; 
        for (p in people) Cell { p.age }
    }
}

CeylonMigrationBot commented 12 years ago

[@gavinking] @FroMage

It's not obvious to me that the second syntax would pass every param in the varargs rather than a list of values as the first element of the vararg array.

How about if it were:

print(for (w in words) w);

CeylonMigrationBot commented 12 years ago

[@gavinking]

Exactly, better yet, if you want both options to be possibly you'd need a way of marking that difference and in comes "..." again...

Nope, you have both options already:

print(w : w in words);  //pass multiple values to a sequenced parameter
print({w : w in words});  //pass a sequence to a parameter

CeylonMigrationBot commented 12 years ago

[@quintesse] But we will still need the ... notation for when we get passed a sequence, right?

print(gimmeSequence()...);

which would mean that

print(w : w in words);

would be a shortcut for:

print({w : w in words}...);

right?

CeylonMigrationBot commented 12 years ago

[@gavinking]

But we will still need the ... notation for when we get passed a sequence, right?

No. Consider:

print(s : s in gimmeSequence())    //means print(gimmeSequence()...)
print(gimmeSequence())

They don't mean the same thing.

Remember, the syntax for getting a sequence back from your comprehension is { s : s in gimmeSequence() } in braces, not just s : s in gimmeSequence(), which is simply a way of specifying sequenced arguments.

CeylonMigrationBot commented 12 years ago

[@quintesse] Ok understood :)

CeylonMigrationBot commented 12 years ago

[@RossTate] So one thing I find weird about s : s in gimmeSequence() is it's the only case where the binding of the variable is after the use of the variable.

I rather like for (s in blah) expr because I can think of for as a generator of expression lists just like it is a generator of sequence lists. Similarly if has the same role (although syntactically it doesn't seem to extend to else or switch well).

Here's a way to maybe unify sequences, nulls, and tuples. Have a notion of expression lists:

EXPRLIST ::= EXPRS [, EXPRS]*
EXPRS
 ::= EXPR
   | for (PATTERN in EXPR) EXPRS
   | if (EXPR) EXPRS
   | if (PATTERN = EXPR) EXPRS
   | EXPR...

Using type information you can easily figure out whether an expression list is

fixed length, in which case {EXPRLIST} is a tuple
empty or singleton, in which case {EXPRLIST} is a X?
one or more, in which case {EXPRLIST} is a X+
other, in which case {EXPRLIST} is a X*

This fits principal types if we have Tuple<X> <: X? <: X* and X+ <: X*.

You can then extend ?. to handle all these other cases (maybe rename it to *.).

CeylonMigrationBot commented 12 years ago

[@RossTate] It might also be cool to add a ?? meta-operator. This is like ... but the expression list it generates can only be captured by {} and not by (). Then you could write {distance(pts1??, pts2??)} to get all the distances between the two lists of points. In particular, you could write {plus((a/b)??, (c/d)??)} to get a Float?, making our use of nulls instead of exceptions less hasslesome.

CeylonMigrationBot commented 12 years ago

[@gavinking]

So one thing I find weird about s : s in gimmeSequence() is it's the only case where the binding of the variable is after the use of the variable.

Yes, I agree. On the other hand, in the case where you have an expression like p.name or x**2, it's often this expression that you want to emphasize.

CeylonMigrationBot commented 12 years ago

[@emmanuelbernard] Coming after the fight but I am like Stef, I much prefer the keyword version than the ascii version of this proposal. And I think non polyglot Java developers will as well.

CeylonMigrationBot commented 12 years ago

[@gavinking] So, FTR, the syntax we're settling on is:

{ for (x in 0..100) if (x%10==0) for (y in -10..+10) if (y%2==0) (x**2+y**2)**0.5 }

Or, if I manage to sell you guys on the new syntax proposed in #3205, this:

{ for (x<=100) if (x%10==0) for (-10<=y<=+10) if (y%2==0) (x**2+y**2)**0.5 }

CeylonMigrationBot commented 12 years ago

[@FroMage] I can see myself getting used to the first syntax, but for the second I'm not too sure yet, since <= has different meaning depending on where it is. Also beware of the Perl operator opportunity effect…

CeylonMigrationBot commented 12 years ago

[@gavinking]

<= has different meaning depending on where it is.

Yes, I can see how you could argue that, but if you accept that then so does in today:

if (i<length)
if (x in xs)

And:

for (i<length)
for (x in xs)

To me it's pretty consistent. I definitely don't think there's scope for misunderstanding.

CeylonMigrationBot commented 12 years ago

[@quintesse] @FroMage well the possibility of using the same construct in an if-statement seems pretty nice, but I must admit that for (x in 0.100) is easier on the eye and in the end the other syntax doesn't give us any other advantages. It seemed pretty cool in the beginning but I'm beginning to doubt.

CeylonMigrationBot commented 12 years ago

[@gavinking] @quintesse the advantage it gives is being able to express both inclusive and exclusive ranges with an immediately readable syntax that can always be directly translated to a C-style for at the Java level We don't need to introduce some slightly-cryptic second syntactic form (start:length) for ranges defined by a possibly-zero length.

It's:

for (i<sequence.size)

vs.

for (i in 0:sequence.size)

To me it's really clear which of those is preferable.

CeylonMigrationBot commented 12 years ago

[@gavinking] OK, here's a new suggestion. I was thinking it would be nice to have a comprehension-like syntax for folds. Now, I still need to investigate the parsing issues, but the following doesn't seem like an unreasonable syntax:

value total = 0.0 for (item in order.items) out += item.produce.price*item.quantity;

where out is a special value representing the accumulated value of the fold. The above would be desugared to:

 value total = items.fold(0.0, (variable Float \iout, Item item) \iout+=item.produce.price*item.quantity);

Not sure if there is a better keyword than out that we could use.

WDYT?

CeylonMigrationBot commented 12 years ago

[@gavinking] Note that if we want to make this look more "functional", the following syntax would be the way to go:

value total = 0.0 for (item in order.items) out + item.produce.price*item.quantity;

CeylonMigrationBot commented 12 years ago

[@quintesse] And what about something more common to other languages, the "yield" keyword?

Would serve the same function but be more familiar maybe.

But in general I think it's a good idea

CeylonMigrationBot commented 12 years ago

[@gavinking] What would yield be for? What we need is a keyword to refer to the accumulated value.

CeylonMigrationBot commented 12 years ago

[@quintesse] The syntax would be:

value total = for (item in order.items) yield item.produce.price*item.quantity;

or maybe we could turn it into a special toplevel method (that doesn't really exist probably, in the same way that the out variable would):

value total = for (item in order.items) yield(item.produce.price*item.quantity);

A think it's better explained here: http://en.wikipedia.org/wiki/Generator_(computer_programming)

CeylonMigrationBot commented 12 years ago

[@gavinking] How is that a fold? i.e. where do you specify what operation is used to accumulate the values?

CeylonMigrationBot commented 12 years ago

[@quintesse] Ah sorry, I thought this was still about the sequence comprehensions. The syntax seemed so similar. Forget it.

CeylonMigrationBot commented 12 years ago

[@RossTate] None of your examples are well defined; you need an initial value for out.

Now, just for comparison, suppose Iterable<Element> had the following method:

T fold<T>(T init, T next(T,Element))

Then we could write your example as (approximately):

order.items.foldl(0) { next(Integer prev, Item item) = prev + item.produce.price*item.quantity }

Note that, while the type argument for foldl can always be inferred, the type of prev must be made explicit. In particular, with your examples, not only do you need to initialize out, you also need to declare its type.

I'm not opposed to any specialized syntax for folding over lists. I'm just pointing out the challenges and the existing solutions.

CeylonMigrationBot commented 12 years ago

[@gavinking]

None of your examples are well defined; you need an initial value for out.

Sorry, I definitely meant to have a slot for the initial value. I'll edit the examples to show what I really meant to type :/

CeylonMigrationBot commented 12 years ago

[@gavinking]

with your examples, not only do you need to initialize out, you also need to declare its type.

I definitely need to initialize it. But why on earth would I need to declare its type?

CeylonMigrationBot commented 12 years ago

[@gavinking] Perhaps this is a better syntax:

value total = for (y:=0.0; x in xs) y += x;

Or even:

value total = fold (y:=0.0) for (x in xs) y += x;

where fold becomes a keyword.

CeylonMigrationBot commented 12 years ago

[@ikasiuk]

value total = for (y:=0.0; x in xs) y += x;

Looks much better than the out version to me, and we don't need a new keyword.

CeylonMigrationBot commented 12 years ago

[@RossTate]

But why on earth would I need to declare its type?

Suppose we had

interface InfTree extends Sequence<InfTree> {
   actual formal InfTree rest;
}

and tree had type InfTree. Then try to come up with a nice algorithm that accepts the following:

value result = for (t := tree; i in 1..10) if (nonempty t) t := {t.rest}

without crashing on the following:

value result = for (t := tree; i in 1..10) t := {t}

eclipse-archived / ceylon

Sequence comprehensions #3188