eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
398 stars 62 forks source link

semantics of comprehensions and sequenced parameters #3390

Closed CeylonMigrationBot closed 8 years ago

CeylonMigrationBot commented 12 years ago

[@gavinking] Overview

This proposal defines uniform semantics for comprehensions and sequenced arguments to sequenced parameters and sequence enumerations. This is based upon the observation that the following syntax (a sequence enumeration)

{ ..... }

is at some level just an abbreviation for this (a named argument invocation)

ArraySequence { ..... }

even if that's not precisely how it's defined in the spec and treated in the type checker.

Therefore this proposal allows either of these constructs to specify the "argument" to a sequenced parameter in one of three different ways:

  1. providing a list of values, for example 1, 2, 3,
  2. using the "spread" syntax, for example people..., or
  3. via a comprehension, for example for (p in people) p.name.

The proposal comes in two parts.

Change to the semantics of sequenced parameters

Currently a sequenced parameter T... is considered to have the type T[]. This is somewhat convenient, but it doesn't work out great for processing large streams of values which we might want to access lazily. Therefore I propose to say that a sequenced parameter has the type Iterable<T>. To allow convenient access to the sequenced parameter as a sequence, we can just support the spread syntax for sequence enumerations, for example:

void f<X>(X... xs) {
    if (nonempty xseq = {xs...}) { 
        ...... 
    }
}

Of course, we also need to redefine the spread syntax to accept Iterable<T> instead of T[]. This anyway works out much better for transforming between different collection types, for example:

value map1 = HashMap(1->"one", 2->"two", 3->"three");
value map2 = HashMap(entrySet...);

Semantics of comprehensions

We can now define the following syntax:

for (x in xs) if (s(x)) p(x)

as an abbreviation for:

project(select(xs, (X x) s(x)), (X x) p(x))...

or perhaps:

xs.select((X x) s(x)).project((X x) p(x))...

if we define these functions as methods on Iterable.

Now we can write:

value sequence = { for (p in people) p.name };
value map = HashMap { for (p in people) if (p.age>=18) p.name->p.age };

[Migrated from ceylon/ceylon-spec#284] [Closed at 2012-06-24 17:14:36]

CeylonMigrationBot commented 12 years ago

[@RossTate] I've gotta get to bed unfortunately, so for now I'll just copy what I said to Gavin a little earlier:

There are three fairly orthogonal aspects to this:

  1. Eagerness - rather than choosing one evaluation strategy and imposing it everywhere, I think we should have three options: { ... } evaluates contents eagerly {| ... |} evaluates contents as they are accessed and caches the result {|| ... ||} evaluates contents as they are accessed and reevaluates per access
  2. Quantity - there are four useful cases here that we can naively determine statically T? - empty or singleton T1_..._Tn - known-length tuple (possibly Empty) T[] - possibly empty list T+ - nonempty list
  3. Subinterfaces - Suppose the Iterable we comprehend over is also Sized, then the result should be Sized as well. More generally, I think there's an interesting opportunity for method extensions (not to be confused with extension methods, and which I'm making up): if the input of a method satisfies certain additional properties then an extension adds interface implementations to the output of the method.
CeylonMigrationBot commented 12 years ago

[@chochos] I like it. So basically for can now accept a single expression? And then { for (p in people) p.name } would be just sugar for p.project((Person p) p.name)? The second one is a little more complicated, since you'll have to detect the if to know that it's a filter and transform the whole thing into p.select((Person p) p.age>=18).project((Person p) p.name->p.age) or something like that.

CeylonMigrationBot commented 12 years ago

[@soheilhy] Just a nitpick about "semantics of comprehensions".

Consider for (x in xs) if (s(x)) p(x) for example. The syntax is a bit ambiguous. To me, it looks like you can support not only selection/projection but any type of expressions such as:

for (x in xs) if (s(x)) p(x) else q(x)

Although, It would be great to support it, I think a Python-like syntax for this would be a bit more clear for selection/projection:

p(x) for (x in xs) if (s(x))

or

value map = HashMap { p.name->p.age for (p in people) if (p.age>=18) };

CeylonMigrationBot commented 12 years ago

[@gavinking] @soheilhassasyeganeh Well, we did consider that over in #3188, and in certain ways I find it more readable, but the argument goes that throughout the rest of the language you have to declare a variable before you can reference it, and that we should stick with the same convention here.

CeylonMigrationBot commented 12 years ago

[@soheilhy] @gavinking sorry I didn't read #3188. The convention makes sense.

CeylonMigrationBot commented 12 years ago

[@gavinking] I suppose select() and project() should be instance methods of Iterable with default implementations, allowing customized refinements that avoid creating an Iterator.

CeylonMigrationBot commented 12 years ago

[@RossTate] In case you don't already know this, the standard names for those operations (at least that I've always seen) are filter and map. Of course, I've mostly seen these in the functional-languages community, so there may be different names in the OO community. Actually, looking online, C# uses Where and Select. Oy, naming conventions...

CeylonMigrationBot commented 12 years ago

[@gavinking] Ross, I'm using the standard terminology for these operations from relational algebra (yes, we'll also need a product/join operation).

CeylonMigrationBot commented 12 years ago

[@RossTate] I think project is only used in that context for projecting components of tuples, not for applying arbitrary functions. That is, it's a specialization of map applied to the project function for tuples.

CeylonMigrationBot commented 12 years ago

[@gavinking]

I think project is only used in that context for projecting components of tuples, not for applying arbitrary functions.

That might be strictly true, but in informal industry usage I've always seen it used pretty much synonymously with "stuff you can do in the SELECT clause" ;-)

CeylonMigrationBot commented 12 years ago

[@gavinking] FTR, in SmallTalk they're called select: and collect:. I'm also quite a fan of those names. Definitely prefer them to map() / reduce() or map() / filter().

CeylonMigrationBot commented 12 years ago

[@FroMage] Regarding the question on naming, 50 years of functional programming has etched the terms map, fold, filter into stone such that even languages like JavaScript, Perl or Java are using it. Using anything else is trying to make it harder for people to get into Ceylon for no good reason.

I am not a big fan of the syntax for comprehensions because:

I have the feeling that for comprehensions are just a Perl-ism, trying to make things shorter using ad-hoc syntax, when there's already another way to do it.

CeylonMigrationBot commented 12 years ago

[@FroMage] +1 on putting operations like filter and map (and a shitload of others) on Iterable, like Java 8.

CeylonMigrationBot commented 12 years ago

[@quintesse] Agree with @FroMage here that map, fold and filter are used in many languages and are the terms that people use when talking about these things, on the other hand almost no Java Enterprise Programmer will know SmallTalk (if they even heard of it).

CeylonMigrationBot commented 12 years ago

[@quintesse] Also agree with the library calls. Building these on Iterable so we can make them lazy and chainable will give us all we need. If the syntax is not "short" enough we might need to take a look at the anonymous function expressions instead.

CeylonMigrationBot commented 12 years ago

[@gavinking] Totally disagree. Look how many times you had to declare what p is in:

people.filter((Person p) p.age>=18).map((Person p) p.name->p.age)

Compared to:

{ from (p in people) if (p.age>18) p.name->p.age }

Now imagine that the type of p is something more complex that Person. Say, Entry<String,List<Integer>>.

Calling an ancient construct from mathematics a "Perl-ism" is really quite a stretch.

CeylonMigrationBot commented 12 years ago

[@FroMage] What I called a Perl-ism is "trying to make things shorter using ad-hoc syntax, when there's already another way to do it."

CeylonMigrationBot commented 12 years ago

[@quintesse] Well the declaration of p is exactly what could be said is a disadvantage of the current anonymous function syntax. If the type of p could have been determined from its use than it would be shorter, something like:

people.filter((p) p.age>=18).map((p) p.name->p.age)

It's a pity that it's not possible right now, you're sure that it's just not doable in Ceylon, right?

CeylonMigrationBot commented 12 years ago

[@gavinking]

Well the declaration of p is exactly what could be said is a disadvantage of the current anonymous function syntax

This has nothing to do with syntax it has to do with having a well-defined typesystem with rules that are easy to understand.

It's a pity that it's not possible right now, you're sure that it's just not doable in Ceylon, right?

Oh all kinds of things are doable. Indeed you can easily hack in just about any kind of ad hoc type inference rules you like into the compiler. The question is whether it's desirable to do so.

CeylonMigrationBot commented 12 years ago

[@RossTate] Heh, I should really start typing funky inference examples into C#, Java, and Scala just to see what happens. I wonder if I can still make any of them crash...

CeylonMigrationBot commented 12 years ago

[@gavinking] OK, support for comprehensions is now implemented in the comprehensions branch of the typechecker and language module. Check it out - it's actually cool as hell!

CeylonMigrationBot commented 12 years ago

[@gavinking] So it looks like the idea of just naively trying to map comprehensions onto map() and filter() doesn't work out after all. The problem is that products like this don't map naturally to a higher-order function:

for (x in 1..12) for (y in 1..12) x*y

Now, I think with sequenced type parameters, it is possible to write down the signature of product() but I seriously don't want to go there. So I think a comprehension needs to be desugared to an inner class that satisfies Iterable. I just need to work out the details of this mapping.

CeylonMigrationBot commented 12 years ago

[@gavinking] Given a comprehension like this:

for (y in ys) for (x in xs) if (x>y) x*y

I think we'll have to generate Java code like this:

new Iterable<Integer>() {
    TupleIterator $i = new TupleIterator(ys, xs);
    @Override 
    java.lang.Object next() {
        while ($i.hasNext()) {
            Object[] $a = $i.next();
            Integer x = $a[0];
            Integer y = $a[1];
            if (x>y) return x*y;
        }
        return exhausted;
    }
}

where TupleIterator is a class that knows how to form the product of iterators.

CeylonMigrationBot commented 12 years ago

[@gavinking] Here's a much more efficient version:

new Iterable<Integer>() {

    Iterable<Integer> $ys = ys.getIterator();
    Iterable<Integer> $xs = xs.getIterator();
    Integer y;
    Integer x;

    boolean $nexty() {
        y = $ys.next();
        if (y instanceof Finished) {
            return false;
        }
        return true;
    }

    boolean $nextx() {
        x = $xs.next();
        if (x instanceof Finished) {
            if ($nexty()) {
                $xs = xs.getIterator();
                x = $xs.next();
                return !(x instanceof Finished);
            }
            return false;
        }
        return true;
    }

    @Override 
    java.lang.Object next() {
        while ($nextx()) {
            if (x>y) return x*y;
        }
        return exhausted;
    }
}
CeylonMigrationBot commented 12 years ago

[@RossTate] Your implementation generalizes into this structure:

new Iterator<T>() {
    // Iterators
    // Context
    ...

    T|Finished next() {
        if ($nextContext()) {
            return $evalExpression();
        } else {
            return exhausted;
        }
    }
}

Note that I incorporate the if filter into $nextContext. Essentially each nested for and if corresponds to a nesting within the implementation of $nextContext. This alteration enables for (x in 1..12) if (x % 2 == 0) for (y in 1..12) x*y.

Also, the reason that map and filter are not enough is that you have nested fors, which requires monadic structure to resolve, either via bind or flatten. Nonetheless, the above structure should be more efficient.

CeylonMigrationBot commented 12 years ago

[@gavinking] Closing, this was implemented in m3.