ceylon / ceylon-spec

DEPRECATED
Apache License 2.0
108 stars 34 forks source link

semantics of comprehensions and sequenced parameters #284

Closed gavinking closed 12 years ago

gavinking commented 12 years ago

Overview

This proposal defines uniform semantics for comprehensions and sequenced arguments to sequenced parameters and sequence enumerations. This is based upon the observation that the following syntax (a sequence enumeration)

{ ..... }

is at some level just an abbreviation for this (a named argument invocation)

ArraySequence { ..... }

even if that's not precisely how it's defined in the spec and treated in the type checker.

Therefore this proposal allows either of these constructs to specify the "argument" to a sequenced parameter in one of three different ways:

  1. providing a list of values, for example 1, 2, 3,
  2. using the "spread" syntax, for example people..., or
  3. via a comprehension, for example for (p in people) p.name.

The proposal comes in two parts.

Change to the semantics of sequenced parameters

Currently a sequenced parameter T... is considered to have the type T[]. This is somewhat convenient, but it doesn't work out great for processing large streams of values which we might want to access lazily. Therefore I propose to say that a sequenced parameter has the type Iterable<T>. To allow convenient access to the sequenced parameter as a sequence, we can just support the spread syntax for sequence enumerations, for example:

void f<X>(X... xs) {
    if (nonempty xseq = {xs...}) { 
        ...... 
    }
}

Of course, we also need to redefine the spread syntax to accept Iterable<T> instead of T[]. This anyway works out much better for transforming between different collection types, for example:

value map1 = HashMap(1->"one", 2->"two", 3->"three");
value map2 = HashMap(entrySet...);

Semantics of comprehensions

We can now define the following syntax:

for (x in xs) if (s(x)) p(x)

as an abbreviation for:

project(select(xs, (X x) s(x)), (X x) p(x))...

or perhaps:

xs.select((X x) s(x)).project((X x) p(x))...

if we define these functions as methods on Iterable.

Now we can write:

value sequence = { for (p in people) p.name };
value map = HashMap { for (p in people) if (p.age>=18) p.name->p.age };
RossTate commented 12 years ago

I've gotta get to bed unfortunately, so for now I'll just copy what I said to Gavin a little earlier:

There are three fairly orthogonal aspects to this:

  1. Eagerness - rather than choosing one evaluation strategy and imposing it everywhere, I think we should have three options: { ... } evaluates contents eagerly {| ... |} evaluates contents as they are accessed and caches the result {|| ... ||} evaluates contents as they are accessed and reevaluates per access
  2. Quantity - there are four useful cases here that we can naively determine statically T? - empty or singleton T1_..._Tn - known-length tuple (possibly Empty) T[] - possibly empty list T+ - nonempty list
  3. Subinterfaces - Suppose the Iterable we comprehend over is also Sized, then the result should be Sized as well. More generally, I think there's an interesting opportunity for method extensions (not to be confused with extension methods, and which I'm making up): if the input of a method satisfies certain additional properties then an extension adds interface implementations to the output of the method.
chochos commented 12 years ago

I like it. So basically for can now accept a single expression? And then { for (p in people) p.name } would be just sugar for p.project((Person p) p.name)? The second one is a little more complicated, since you'll have to detect the if to know that it's a filter and transform the whole thing into p.select((Person p) p.age>=18).project((Person p) p.name->p.age) or something like that.

soheilhy commented 12 years ago

Just a nitpick about "semantics of comprehensions".

Consider for (x in xs) if (s(x)) p(x) for example. The syntax is a bit ambiguous. To me, it looks like you can support not only selection/projection but any type of expressions such as:

for (x in xs) if (s(x)) p(x) else q(x)

Although, It would be great to support it, I think a Python-like syntax for this would be a bit more clear for selection/projection:

p(x) for (x in xs) if (s(x))

or

value map = HashMap { p.name->p.age for (p in people) if (p.age>=18) };

gavinking commented 12 years ago

@soheilhassasyeganeh Well, we did consider that over in #82, and in certain ways I find it more readable, but the argument goes that throughout the rest of the language you have to declare a variable before you can reference it, and that we should stick with the same convention here.

soheilhy commented 12 years ago

@gavinking sorry I didn't read #82. The convention makes sense.

gavinking commented 12 years ago

I suppose select() and project() should be instance methods of Iterable with default implementations, allowing customized refinements that avoid creating an Iterator.

RossTate commented 12 years ago

In case you don't already know this, the standard names for those operations (at least that I've always seen) are filter and map. Of course, I've mostly seen these in the functional-languages community, so there may be different names in the OO community. Actually, looking online, C# uses Where and Select. Oy, naming conventions...

gavinking commented 12 years ago

Ross, I'm using the standard terminology for these operations from relational algebra (yes, we'll also need a product/join operation).

RossTate commented 12 years ago

I think project is only used in that context for projecting components of tuples, not for applying arbitrary functions. That is, it's a specialization of map applied to the project function for tuples.

gavinking commented 12 years ago

I think project is only used in that context for projecting components of tuples, not for applying arbitrary functions.

That might be strictly true, but in informal industry usage I've always seen it used pretty much synonymously with "stuff you can do in the SELECT clause" ;-)

gavinking commented 12 years ago

FTR, in SmallTalk they're called select: and collect:. I'm also quite a fan of those names. Definitely prefer them to map() / reduce() or map() / filter().

FroMage commented 12 years ago

Regarding the question on naming, 50 years of functional programming has etched the terms map, fold, filter into stone such that even languages like JavaScript, Perl or Java are using it. Using anything else is trying to make it harder for people to get into Ceylon for no good reason.

I am not a big fan of the syntax for comprehensions because:

I have the feeling that for comprehensions are just a Perl-ism, trying to make things shorter using ad-hoc syntax, when there's already another way to do it.

FroMage commented 12 years ago

+1 on putting operations like filter and map (and a shitload of others) on Iterable, like Java 8.

quintesse commented 12 years ago

Agree with @FroMage here that map, fold and filter are used in many languages and are the terms that people use when talking about these things, on the other hand almost no Java Enterprise Programmer will know SmallTalk (if they even heard of it).

quintesse commented 12 years ago

Also agree with the library calls. Building these on Iterable so we can make them lazy and chainable will give us all we need. If the syntax is not "short" enough we might need to take a look at the anonymous function expressions instead.

gavinking commented 12 years ago

Totally disagree. Look how many times you had to declare what p is in:

people.filter((Person p) p.age>=18).map((Person p) p.name->p.age)

Compared to:

{ from (p in people) if (p.age>18) p.name->p.age }

Now imagine that the type of p is something more complex that Person. Say, Entry<String,List<Integer>>.

Calling an ancient construct from mathematics a "Perl-ism" is really quite a stretch.

FroMage commented 12 years ago

What I called a Perl-ism is "trying to make things shorter using ad-hoc syntax, when there's already another way to do it."

quintesse commented 12 years ago

Well the declaration of p is exactly what could be said is a disadvantage of the current anonymous function syntax. If the type of p could have been determined from its use than it would be shorter, something like:

people.filter((p) p.age>=18).map((p) p.name->p.age)

It's a pity that it's not possible right now, you're sure that it's just not doable in Ceylon, right?

gavinking commented 12 years ago

Well the declaration of p is exactly what could be said is a disadvantage of the current anonymous function syntax

This has nothing to do with syntax it has to do with having a well-defined typesystem with rules that are easy to understand.

It's a pity that it's not possible right now, you're sure that it's just not doable in Ceylon, right?

Oh all kinds of things are doable. Indeed you can easily hack in just about any kind of ad hoc type inference rules you like into the compiler. The question is whether it's desirable to do so.

RossTate commented 12 years ago

Heh, I should really start typing funky inference examples into C#, Java, and Scala just to see what happens. I wonder if I can still make any of them crash...

gavinking commented 12 years ago

OK, support for comprehensions is now implemented in the comprehensions branch of the typechecker and language module. Check it out - it's actually cool as hell!

gavinking commented 12 years ago

So it looks like the idea of just naively trying to map comprehensions onto map() and filter() doesn't work out after all. The problem is that products like this don't map naturally to a higher-order function:

for (x in 1..12) for (y in 1..12) x*y

Now, I think with sequenced type parameters, it is possible to write down the signature of product() but I seriously don't want to go there. So I think a comprehension needs to be desugared to an inner class that satisfies Iterable. I just need to work out the details of this mapping.

gavinking commented 12 years ago

Given a comprehension like this:

for (y in ys) for (x in xs) if (x>y) x*y

I think we'll have to generate Java code like this:

new Iterable<Integer>() {
    TupleIterator $i = new TupleIterator(ys, xs);
    @Override 
    java.lang.Object next() {
        while ($i.hasNext()) {
            Object[] $a = $i.next();
            Integer x = $a[0];
            Integer y = $a[1];
            if (x>y) return x*y;
        }
        return exhausted;
    }
}

where TupleIterator is a class that knows how to form the product of iterators.

gavinking commented 12 years ago

Here's a much more efficient version:

new Iterable<Integer>() {

    Iterable<Integer> $ys = ys.getIterator();
    Iterable<Integer> $xs = xs.getIterator();
    Integer y;
    Integer x;

    boolean $nexty() {
        y = $ys.next();
        if (y instanceof Finished) {
            return false;
        }
        return true;
    }

    boolean $nextx() {
        x = $xs.next();
        if (x instanceof Finished) {
            if ($nexty()) {
                $xs = xs.getIterator();
                x = $xs.next();
                return !(x instanceof Finished);
            }
            return false;
        }
        return true;
    }

    @Override 
    java.lang.Object next() {
        while ($nextx()) {
            if (x>y) return x*y;
        }
        return exhausted;
    }
}
RossTate commented 12 years ago

Your implementation generalizes into this structure:

new Iterator<T>() {
    // Iterators
    // Context
    ...

    T|Finished next() {
        if ($nextContext()) {
            return $evalExpression();
        } else {
            return exhausted;
        }
    }
}

Note that I incorporate the if filter into $nextContext. Essentially each nested for and if corresponds to a nesting within the implementation of $nextContext. This alteration enables for (x in 1..12) if (x % 2 == 0) for (y in 1..12) x*y.

Also, the reason that map and filter are not enough is that you have nested fors, which requires monadic structure to resolve, either via bind or flatten. Nonetheless, the above structure should be more efficient.

gavinking commented 12 years ago

Closing, this was implemented in m3.