Allow certain statements as expressions

As a follow-up to #377 and our recent discussion I'd like to start this discussion in its own github issue so we can all keep track of it.

I'm proposing that we allow the following statements in expression contexts:

if (t) e which evaluates to e if t is true, and null or possibly a singleton called notThere or something.
if (t) e1 else e2 which evaluates to e1 if t is true, and e2 otherwise
while (t) e which evaluates to an Iterable containing each value of e while t is true. To be useful in comprehensions, we would make while ignore each value of e which is equal to notThere as returned by if.
for (i in v) e which evaluates to an Iterable containing each value of e while iterating each element of v in a new binding i. To be useful in comprehensions, we would make for ignore each value of e which is equal to notThere as returned by if.
try e which evaluates to e
try e catch (T x) ec which evaluates to e unless an exception of type T is throw in which case it evaluates to ec
try e finally ef which evaluates to e
try e catch (T x) ec finally ef which evaluates to e unless an exception of type T is throw in which case it evaluates to ec
switch(e) case (c1) e1 ... case (cn) en default ed evaluates to the first ei clause whose test ci is true, and otherwise to ed

Naturally I'd also allow each of these expressions to be blocks of statements if they end with an expression (but this part is optional because readability may be an issue):

{stmt...; e} evaluates to e

That would go with function expression bodies too.

That would imply changes wrt to comprehensions such as:

Iterable<Character> chars = for (c in "Hello") c;
Character[] charsSequence = (for (c in "Hello") c).sequenced;

void takesIterable(Iterable<Character> chars){}

takesIterable( for (c in "Hello") c );
takesIterable{ chars = for (c in "Hello") c ; };

void takesSequence(Character[] chars){}

takesSequence( (for (c in "Hello") c).sequenced );
takesSequence{ chars = (for (c in "Hello") c).sequenced ; };

void takesVarargs(Character... chars){}

takesVarargs( for (c in "Hello") c ... );
takesVarargs{ for (c in "Hello") c ... };

As opposed to the current:

Iterable<Character> chars = elements { for (c in "Hello") c };
Character[] charsSequence = { for (c in "Hello") c };

void takesIterable(Iterable<Character> chars){}

takesIterable( { for (c in "Hello") c } ); // forced eager
takesIterable{ chars = { for (c in "Hello") c}; }; // forced eager
// or alternatively:
takesIterable( elements { for (c in "Hello") c } ); // lazy
takesIterable{ chars = elements { for (c in "Hello") c}; }; // lazy

void takesSequence(Character[] chars){}

takesSequence( { for (c in "Hello") c } );
takesSequence{ chars = { for (c in "Hello") c }; };

void takesVarargs(Character... chars){}

takesVarargs( for (c in "Hello") c );
takesVarargs{ for (c in "Hello") c };

But it would also allow things like:

Iterable<Character> chars = for (c in "Hello") if (c.uppercase) c.lowercased else c.uppercased;
Integer count = try db.count("Items") catch (Exception x) 0;

variable Float x := 1.0;
Iterable<Integer> allSquareRootApproximations = while (!goodEnough(x)) x := betterApproximation(x);

So, it's a tradeoff because we're currently specified comprehensions as neither expressions nor statements to make certain special use cases easier. As a result comprehensions are just that: special. We can't refactor them around without special magic such as placing them in a sequence literal (eager) or passing them as parameter to the elements method which turns them into a lazy Iterable. Those restrictions are hard to understand, even if they are justified by readability.

Turning comprehensions into expressions, along with a few other statements:

improves refactoring,
allows passing comprehensions to methods that accept Iterable easier,
makes comprehensions more powerful,
makes comprehensions more regular, and
makes comprehensions easier to understand.

On the other hand, it also makes

turning them into a Sequence more verbose (but easy to understand why), and
passing them as values to variadic methods slightly more verbose because we have to spread them with ....

Now, let's discuss that.

Good ideas here, but let me remind you of one thing. If you turn comprehensions into expressions, then for (x in xs) for (y in ys) x + y is a Iterable<Iterable<Float>> rather than an Iterable<Float>. To get an Iterable<Float> you'd have to do for (x in xs) (for (y in ys) x + y)...; the parentheses are required to prevent ambiguity in the case where the body of the inner comprehension is itself an iterable.

I still really like the idea of generalizing comprehensions and turning them into expressions. It's so much more flexible and regular that way! But I prefer using null (as suggested in #377) instead of a separate notThere value.

Allowing statement blocks as expressions ({statements; expr}) is problematic IMO. That's what Scala does and it has the nice effect that they don't need a separate if/while/for statement and if/while/for expression because the two are identical. But the downside is that the code structure becomes more irregular and less readable. Especially the "trick" of taking the last expression in the block as the result often doesn't look nice. Ceylon code currently looks pleasantly regular and readable and I think that's a real advantage of the language.

The most difficult problems are still the same as in #377: how can comprehensions be used as sequenced arguments with a convenient syntax, and how do we solve the "nested fors" problem pointed out by Ross? It seems that these problems are related because the most promising approach for the "nested fors" is apparently to interpret the RHS of the for as a sequenced parameter (suggested solutions in #377 are based on this approach).

So it looks like this issue somewhat depends on #416, which redefines how sequenced parameters work. I must admit that I still haven't understood completely Gavin's latest proposal in #416: there is apparently an important difference between f(a, b) and f{a, b} but I'm not sure how exactly it works. I guess it's pretty clear that the semantics of comprehensions and of invocation arguments have to be considered together.

So a couple of problems with this:

if if (x) y is an expression, then to recapture the filtering semantics of if in comprehensions, we would have to have to return some strange special value like notThere or whatever, and that's super-ugly
as pointed out by @RossTate, if for (x) y is an expression, then we would need a special syntax for comprehension joins/products
if for (x) y is an expression, then we need a special syntax for "spreading" a comprehension, { for (x) y ... }, ArrayList { for (x) y ... }, etc, since { for (x) y, for (z) w } would be a sequence of two Iterables
the same observation applies to tuple instantiation / sequenced parameters: (1, for (x) y) would be a <Integer,Iterable<Y>>, not a <Integer,Y...>.

Now, it's true that we could probably solve all three of these problems with some special "spread" operator, e.g.

for (p in people) * if (p.age>=18) p.name
for (o in orgs) * for (p in o.members) p.name
{ * for (p in people) p.name }
(* for (p in people) p.name)

But that to me significantly obfuscates our nice clean comprehensions syntax. And all you're doing is introducing a way to turn a comprehension into a not-expression.

I would much prefer to leave for and if like they are now (i.e. not expressions) and provide a way to turn them into an expression. Today we provide { ... } and ( ... ) as a way to turn a for into a sequence or tuple. If we really need a way to get an Iterable from a for then let's provide a way to do that directly (i.e. something less difficult to explain than elements()). But I don't really believe that this is something that we do need.

On the larger issue of "why not just make all control structures be expressions", this is something I have wrestled with many times over the last couple of years and always realized that, even though it's easy to say like that, and have people sorta know what you mean, to actually rigorously define the real semantics requires basically sitting down and defining a totally new language construct for each of the control structures, with a set of semantics and restrictions that are in fact totally different to the control structure whose name it shares. i.e you wind up not simplifying the language by making "everything an expression", on the contrary you wind up introducing a bunch of new kinds of expressions, while keeping all the complexity you already had in the definition of the control structures. This is complexity we simply don't need.

And yes, I do know this is a superficialy appealing thing to do, and I also know that some other languages do it. But those languages don't usually have stuff like break, continue, and non-local return. And they certainly don't have such a powerful comprehensions facility.

P.S. The only language I know of that makes this stuff really work convincingly is Smalltalk. But Smalltalk is a very different language syntactically.

I really like this. But maybe we should use a different syntax:

value foo = if(bar) => baz; // foo is of type "Baz?"

I'm also thinking about a keyword to "return" an expression from inside block... Maybe give...

value foo = if(bar)
{
    // do stuff
    give baz;
}

When a statement gives an expression, then it is said to also be an expression. Another cool idea is to allow these statements to give comprehensions. Here is an example:

if(foo)
{
    give for(value bar in baz) => bar.qux;
}

Or, more shortly:

if(foo) => for(value bar in baz) => bar.qux;

In this case, the if is a comprehension, and can be used as such, for example, in stream literals. I'd say this is pretty sweet! If a control flow expression uses a block that does not contain a give statement, then it is not an expression, nor an comprehension, but rather, just a simple statement.

value foo = if(bar){baz.qux();} // Oh noes, compile error!!! Run!!1
value quux = {if(qux){quuux();}}; // Oh my gosh, not another one! Run again!!!

Similarly, loops that use a block, and do not contain a give statement, are not comprehensions. We say that if a statement gives an expression, then it is an expression, and if it gives a comprehension, then it is a comprehension. This statement, if it is an expression, is said to return the value the expression it give returns. Loops give comprehensions. A comprehension is said to give multiple expressions. If a statement gives a comprehension, then this statement is said to give all the expressions that the comprehension gives. Expressions are also comprehensions, and are said to give only themselves. Now, that's cool and all, but we come into a dilemma: Things that give expressions are expressions, things that give comprehensions are comprehensions. Loops can convert expressions into comprehensions, by repeatedly evaluating them. But what if you put a comprehension in the loop's give statement? You get a comprehension of comprehensions? That becomes more clear if you see comprehensions by what they really are in Ceylon: An comma-separated list of expressions that gets generated at runtime, rather than being explicit at the source code. So, anywhere you can write for(Integer i in 1..3)=>i, you can also write 1, 2, 3 and it will accomplish the exact same thing. Similarly, anywhere you can write for(Integer i in 1..3)=>foo(i) you can also write foo(1), foo(2), foo(3). A comprehension that gives a comprehension, then is a comma-separated list of comprehensions.

for(value stream in [1..3, 4..6, 7..9]) => for(Integer i in stream) => i

Would be the equivalent of writing:

for(Integer i in 1..3) => i, for(Integer i in 4..6) => i, for(Integer i in 7..9) => i

We already established that for(Integer i in 1..3) => i is equivalent to 1, 2, 3. Using a similar logic, we can conclude that the other fors are respectively, 4, 5, 6 and 7, 8, 9. If we replace the fors we get:

1, 2, 3, 4, 5, 6, 7, 8, 9

We can then say that a comprehension that give a group of comprehensions G, is simply an comprehension that gives all the expressions in all the comprehensions in G. Now, if we want to have a comprehension that gives multiple statements that return a stream of Integers, we can simply do this:

for(value stream in [1..3, 4..6, 7..9]) => {for(Integer i in stream) => i}

The spread operator simply converts a stream into a comprehension. *stream could also be written as for(value e in stream) => e. (Duh) But we still have a huuuge deal breaker! What if we want chained control flow structures that use blocks, what of the control flow structures will the give statement be associated with? Well, I really can't think of a good looking way to be able to control this. The prettiest thing I can think of is using ^. For example:

for(/***/)
{
    while(/***/)
    {
        if(/***/)
        {
            /***/
            ^give foo; // The while loop gives foo.
            /***/
            give foo; // The if statement gives foo.
            /***/
            ^^give foo; // The for loop gives foo.
            /***/
        }
    }
}

Similarly, you can exit of a control flow structure that isn't a expression nor a comprehension (does not contain a give statement) by using breaks.

for(/***/)
{
    while(/***/)
    {
        if(/***/)
        {
            /***/
            ^break; // Exits the while loop.
            /***/
            break; // Exits the if statement.
            /***/
            ^^break; // Exists the for loop.
            /***/
        }
    }
}

Just for the record, here is how all the control flow structures would look using the arrow, in proposedly valid code:

Foo?  qux = if(bool) => foo;

Foo|Bar quux = if(bool) => foo;
               else => bar;

{Foo*} qUx = {for(Foo foo in fooStream) => foo};

{Foo*} qUux = {while(bool) => foo};

Foo qUuux = try => baz();
            catch(Foo e) => e;

Foo qUUx = switch(foo)
           case(one) => foo;
           case(two) => foo;
           case(three) => foo;
           else => foo;

Foo qUUux = try => foo;
            finally => foo; // The finally statement may contain a block that may or may not give (all other blocks in control flow structures must either not give, or definitely give). If the finally gives, then the try/finally statement gives what is given by finally, otherwise, what is given by try. Unless the try block doesn't give, in which case, the finally block must either not give, or definitely give, and the try/finally statement gives what is given by the finally block, if it does.

Foo qUUuux = try => baz();
             catch(Foo e) => foo;
             finally => foo; // Works similar to the above.

And as a bonus, I propose a do statement. It simply gives its unchanged expression/comprehension. This can be useful to call getters, and to declare variables in comprehensions.

do => foo.bar;
{Object*} bar = {do
{
    value baz = //...
    give for(/***/)=>//...
}};

If you put a comprehension in a sequence literal, for example, what it does is evaluate all the expressions the comprehension gives. If you put it into a stream literal, it does the same thing, but lazily. Oh, and just to be cool: you can put multiple comma-separated comprehensions in a give statement:

{Object*} stream = {do{give true, for(/***/)=>/***/, while(/***/)=>/***/, 1, foo, "Oatmeal?", "Are you crazy?"}}

I've also organized my thoughts in a couple of simple logical rules:

Statements do something.
Statements are expressions if and only if they return a value.
Statements are comprehensions if and only if they give one or more comprehensions.
Comprehensions give themselves.
Expressions are comprehensions.
Comprehensions are expressions if and only if they only give one expression, in which case, they return the value returned by the expression they give.
Comprehensions give only all the comprehensions given by the comprehensions they give.

The second to last one rule is pretty interesting, since it's the only one the compiler can't check for sure, since the amount of expressions a comprehension can give might change at the runtime.

{Object*} stream = // ...
value v = for(value o in stream)=>o; // The compiler just assumes that loops aren't expressions, because for it, they aren't, since it can't know the stream size. It could treat them as an expression if stream where a tuple of size one, but it's better to just treat loops as if they are not expressions, in my opinion. If you know for sure the loop will only run once, then you don't even need a loop!

Just one more thing: I'm kinda still a noob in computer science. I have been doing it only for a hobby for no more than four years, so I might have said some things that are no more than stupid. If that's the case, I'm really sorry, please feel free to correct me. I also never tried interacting with anyone on GitHub (this is my first post, yay o/), so hopefully I'm not doing anything terribly wrong. I hope you guys like my idea. At least I had fun coming up with it :-) Cheers!

@Zambonifofex Note that if and switch expressions were already implemented in Ceylon 1.2. So some of the stuff mentioned in this issue description is considered "already done".

ceylon / ceylon-spec

Allow certain statements as expressions #457