ceylon / ceylon-spec

DEPRECATED
Apache License 2.0
108 stars 34 forks source link

comprehensions as lazy expressions #488

Closed gavinking closed 11 years ago

gavinking commented 11 years ago

The following proposal is an extension of the discussion in #460, and seeks to:

That is, this proposal is an adjustment to the language that seeks to solve several related problems that have been swirling around for a while, especially since the introduction of tuples.

I hope I'm not missing anything here!

First:

  1. Parens ( ... ) are used (only) for grouping and function application at the value level.
  2. Angle brackets < ... > are used (only) for grouping and function application at the type level.

Second:

Third:

The following are all expressions of type {String...}:

{}                               //type []
{ for (p in people) p.name }     //type {String...} 
{ "hello", "world", "goodbye" }  //type [String, String, String]
{ "hello", sequence... }         //type [String, String...]
{ "hello", iterable... }         //type {String...}

The following are all expressions of type String[]:

[]                               //type []
[ for (p in people) p.name ]     //type [String...]
[ "hello", "world", "goodbye" ]  //type [String, String, String]
[ "hello", sequence... ]         //type [String, String...]
[ "hello", iterable... ]         //type [String, String...]

That is, the difference between { ... } and [ ... ] is that [ ... ] forces immediate eager evaluation of its individual elements. If the elements are already (provably) evaluated, the two constructs are equivalent.

Finally:

Thoughts?

ikasiuk commented 11 years ago

First impression: this looks really good!

I guess f(for (p in people) p.name) means f.invoke([for (p in people) p.name]) and f{for (p in people) p.name} means f.invoke([{for (p in people) p.name}]) (and is equivalent to f({for (p in people) p.name}))?

gavinking commented 11 years ago

@ikasiuk yes, that's right! A comprehension directly inside a positional argument invocation gets "spread" out over a sequenced parameter. A comprehension directly inside a named argument invocation gets passed directly to a parameter of type Iterable.

chochos commented 11 years ago

Is there absolutely no way to confuse [] for empty with the spread operator?

Looks good, but I don't know if in practice it will be confusing to decide when to invoke named args and when to invoke positional args, for comprehensions and sequenced args; and also to decide if a method should receive a Sequential or an Iterable (when designing APIs etc)

RossTate commented 11 years ago

Yay!!! So many things I have been hoping for are happening. So now for some fine tuning.

First, if you have the shorthand T[], then you should also have the shorthand T{}. Alternative you could have neither shorthand.

I recommend adding at least the type {T} (as a subtype of {T...}) so that you can pass lazily evaluated expressions easily. One nice thing about only doing {T} and {T...} (and not {String, Integer, Float...} say), is that it's clear what a memoization semantics should be and it can be done efficiently.

Lastly (though I swear I forgot something), I still prefer parens for type-level grouping. Especially now that you've made everything else correspond between types and expressions!

gavinking commented 11 years ago

So it looks like this proposal is working out. I have implemented the syntactic side of it in @f46175da3b06091f9f7b806f8f6641fc7aecee99.

The only thing I'm a little dubious of is that I had to disallow ... and comprehensions inside annotation named argument lists, in order to accommodate the new type abbreviation {T...}. I don't think it's a practical problem, since annotation argument lists are going to come with all sorts of other restrictions anyway, once we get around to defining how they really work. But still, it's a loss of regularity in the grammar that I don't love.

Everything else worked out just fine, at least grammatically.

gavinking commented 11 years ago

An idea I'm kinda toying with, that we definitely would not have to implement now, but that we could do later on is let you call a function using any of the three kinds of delimiters, with named arguments in any of the three options.

There's an appealing symmetry to this, though the usefulness of it is debatable.

Perhaps it just gives you too many options to learn and choose between. Right now I can't think of a compelling need for f[a=x; b=y; z, w].

On the other hand, several people have asked my why they can't have named arguments in a positional argument list. Would help when you have lots of defaulted parameters and just want to specify one or two of them explicitly.

FroMage commented 11 years ago

I guess I like it, though people are going to complain about not using parenthesis for grouping of types.

I don't understand the distinction between f(a,b,c) and f{a,b,c} when f is variadic. In both cases the variadic param is eager, no?

The one thing that I am starting to think is that we have too much syntax to do similar things that are subtle differences. [], () <> and {} are starting to define very subtle rules that people will have a hard time remembering, which reminds me a bit too much of Perl :( I'm starting to regret having tuples in the first place (though I understand it was needed to replace variadic type parameters).

FroMage commented 11 years ago

{ "hello", iterable... } //type {String...}

That means that the sequence we construct should take an Iterable param to not evaluate the spread iterable in the example until we're asked for it (ie. NOT at sequence literal instantiation), right?

FroMage commented 11 years ago
  • f{a=x; b=y; z, w} would mean f.invoke([x,y,{z, w}])
  • f[a=x; b=y; z, w] would mean f.invoke([x,y,[z, w]])
  • f(a=x; b=y; z, w) would mean f.invoke([x,y,z,w])

Can you tell me the type of f? I don't understand why can invoke it in two different ways on the right.

FroMage commented 11 years ago

The type Sequential<T> may be written [T...] or T[]

So now sequences are tuples?

RossTate commented 11 years ago

Isn't f[a=x; b=y; z, w] syntax ambiguous when there is only w, i.e. f[w], since that could also be an index access?

quintesse commented 11 years ago

I think I'm okay with this overall as well, but I must say I do agree with the thing @chochos brought up. The idea that an argument can mean something different depending if it's used inside a positional arg list or a named one seems confusing.

gavinking commented 11 years ago

I guess I like it, though people are going to complain about not using parenthesis for grouping of types.

When they do, I'm going to complain that I don't have a pony.

I don't understand the distinction between f(a,b,c) and f{a,b,c} when f is variadic. In both cases the variadic param is eager, no?

The syntax f{a,b,c} is not (anymore) used to send multiple arguments to a sequenced parameter. It is just to send an instance of Iterable to a parameter of type Iterable.

I am starting to think is that we have too much syntax to do similar things that are subtle differences.

Yes well. Fuck ASCII. Can I have my pony now?

[], () <> and {} are starting to define very subtle rules that people will have a hard time remembering

According to this proposal:

And notice that this proposal totally cleans up the previously-existing subtle differences between (x), f(x), and (x,y), and between T<X,Y> and <X,Y>. I think overall it's a net gain in clarity, and is certainly a net gain in expressiveness.

What I'm more concerned about is that certain syntactic constructs look very similar, but their semantics depend upon whether they occur as a type or in an expression. For example:

I believe that in all cases the meaning is very clear from the context. I don't think we're going to make people's eyes hurt with this stuff, but I can't say I have 100% confidence in this. The above problems could be substantially ameliorated by dropping some type abbreviations, but at this point I think that would harm readability, not improve it.

Still waiting for my pony.

gavinking commented 11 years ago

That means that the sequence we construct should take an Iterable param to not evaluate the spread iterable in the example until we're asked for it

Yes. You can only pass lazily to parameters of type Iterable. That's a consequence of the relationship between tuple types and sequenced parameters that we specified a couple of weeks ago.

Can you tell me the type of f? I don't understand why can invoke it in two different ways on the right.

Those are three different fs. Not the same function.

So now sequences are tuples?

Sorta. They're a degenerate case of the more general definition of a tuple type. They're not instances of Tuple though.

gavinking commented 11 years ago

Isn't f[a=x; b=y; z, w] syntax ambiguous when there is only w, i.e. f[w], since that could also be an index access?

Yes, you're right, sorry. Duh.

OK, screw that idea.

gavinking commented 11 years ago

I don't know if in practice it will be confusing to decide when to invoke named args and when to invoke positional args, for comprehensions and sequenced args; and also to decide if a method should receive a Sequential or an Iterable (when designing APIs etc)

The advice would be that general-purpose for processing collections should accept Iterable. In fact, a major goal of all this redesign work has been to make that possible. Previously it was very hard to decide between a sequenced parameter and an Iterable parameter, but that problem is now solved. You can now write:

HashSet(names)
HashSet { names... }
HashSet { "Java", "Ceylon" }
HashSet { for (n in names) n.uppercased }

Assuming class HashSet<Element>({Element...} elements) { ... }.

Sequenced parameters are now a much more special-purpose thing for cases where you don't expect to ever be receiving collections. Things like log(), format(), assertTrue(), etc.

Does that make sense?

FroMage commented 11 years ago

Assuming class HashSet<Element>({Element...} elements) { ... }.

Now, that's more readable as class HashSet<Element>(Iterable<Element> elements) { ... }. But have we lost variadic params then? With such a definition, I can't create an empty hash set with HashSet(), right?

FroMage commented 11 years ago

BTW, the pony dept is somewhere else, I'm not sure why you confused this place with a stable.

FroMage commented 11 years ago

Those are three different fs. Not the same function.

Ah OK.

Sorta. They're a degenerate case of the more general definition of a tuple type. They're not instances of Tuple though.

But I thought [T...] was a tuple?

gavinking commented 11 years ago

Now, that's more readable as class HashSet<Element>(Iterable<Element> elements) { ... }.

Well yeah, grrr, what I though as I was typing it. I just have this totally irrational feeling that we should have an abbreviation for Iterable<T>.

But have we lost variadic params then?

No, we still have them, but you have to call them like this:

printf("Decimals: %d %3.2f", 1977, 123.34);

This falls out from the definition of the callable type of a variadic function. The function

void printf(String format, Object... values)

has the callable type Callable<Void,[String,Object...]> which of course is usually written as Void(String, Object...).

With such a definition, I can't create an empty hash set with HashSet(), right?

Correct, you can create it like this:

HashSet{}
gavinking commented 11 years ago

But I thought [T...] was a tuple?

It is a degenerate case of a tuple type, synonymous with T[].

Tuple types are defined recursively:

Does that make sense? It's extremely elegant once you understand it...

gavinking commented 11 years ago

BTW, the pony dept is somewhere else, I'm not sure why you confused this place with a stable.

What stable? This is a magic pony. It can sleep in my car.

FroMage commented 11 years ago

Correct, you can create it like this: HashSet{}

See, I think people will stick to variadic params just to be able to stay in positional invocation land.

What were the reasons again for making variadic params different from Iterable<T> arg = {}?

FroMage commented 11 years ago

Does that make sense? It's extremely elegant once you understand it...

You left out [], it's Empty right?

gavinking commented 11 years ago

@FroMage I added it in an edit ;-)

gavinking commented 11 years ago

See, I think people will stick to variadic params just to be able to stay in positional invocation land.

Well, if they want laziness they won't be able to.

What were the reasons again for making variadic params different from Iterable<T> arg = {}?

Because when you pass a single argument of type Iterable<T>, it's ambiguous.

gavinking commented 11 years ago

Well, if they want laziness they won't be able to.

That is, unless:

etc.

But I dunno, "lazy" tuples kinda give me the heeby-jeebies.

P.S. I suppose that's what @RossTate is getting at with his {X, Y...} types. They would be the lazy counterpart of the tuple types [X, Y...].

FroMage commented 11 years ago

Right, but I mean, the only pb is that variadic params are sequences (eager) rather than iterables (lazy). What was the reason for that again? Because if they were lazy then nobody would write HashSet<T>(Iterable<T> values) and all would be well, no?

gavinking commented 11 years ago

No that's what we had before and it didn't really work out.

Sent from my iPhone

On 05/12/2012, at 5:34 PM, Stéphane Épardaud notifications@github.com wrote:

Right, but I mean, the only pb is that variadic params are sequences (eager) rather than iterables (lazy). What was the reason for that again? Because if they were lazy then nobody would write HashSet(Iterable values) and all would be well, no?

— Reply to this email directly or view it on GitHub.

RossTate commented 11 years ago

Right, but I mean, the only pb is that variadic params are sequences (eager) rather than iterables (lazy). What was the reason for that again? Because if they were lazy then nobody would write HashSet<T>(Iterable<T> values) and all would be well, no?

I like them to be strict from a readability standpoint. If variadic arguments are strict and I see f(x.getValue(), x.getValue()) I know that the two arguments of f are the same regardless of whether f takes 2 normal arguments, 1 normal argument and a variadic argument, or only a variadic argument. So I can easily reason about how my program will execute without knowing anything about f. If variadic arguments were lazy, then I'd always have to look of the declaration of f to figure out what the rest of my program is going to do, which would suck.

P.S. I suppose that's what @RossTate is getting at with his {X, Y...} types. They would be the lazy counterpart of the tuple types [X, Y...].

Yes, though the one complication with a type of {String, Integer, Float...} is what happens if they look at the Integer before the String? Do we always evaluate them in order regardless? Or do we only evaluate an element when it's explicitly retrieved? Also, do we memoize things? Even with just a {String...}, we have to decide whether we should evaluate an element that just gets skipped, and we have to decide whether we memoize results as we evaluate them. Laziness at the language level is not so clear, unfortunately.

FroMage commented 11 years ago

If variadic arguments were lazy, then I'd always have to look of the declaration of f to figure out what the rest of my program is going to do, which would suck.

True, but we don't have a way of creating a lazy iterator literal that wouldn't evaluate its expressions, except for comprehensions. {e1, e2} would still evaluate its expressions before creating the Iterable, if I understand it correctly.

So for most things (anything not already an user-made implementation of Iterable or a comprehension) they would be eager no matter what.

gavinking commented 11 years ago

If the type of printf is Callable<Void,[String,Object...]>, that is, Callable<Void,Tuple<Object,String,Object[]>, it would be very strange to me if the type of values were something other than Object[]. I mean, the callable type encodes the type of values as an eager type expression, so where would a lazy type suddenly come from?

Now, sure, if we had such a construct as {String,Object...} then we could define Void(String,Object...) to mean Callable<Void,{String,Object...}>, that is Callable<Void,LazyTuple<Object,String,{Object...}>>, and then the type of values would naturally be {Object...}.

The question is:

I would say "no", since one of the goals I started out with here was to either eliminate sequenced parameters or at least relegate them to a secondary role. I don't want to use them for processing collections or streams of values anymore, because I feel that that simply didn't work out very well. For general-purpose collection/stream processing functions, it's an absolute shit to have to type the ..., for example:

sort(names...)

Since the case where you pass a pre-existing collection or stream reference is actually the most common case.

Much better to give you the choice of:

sort(names)
sort { "Stef" }
sort { "Stef", "Tako" }
sort { for (p in person) p.name }

This optimizes the most common case, and leaves the more exotic cases looking just as reasonable as before.

ikasiuk commented 11 years ago

Still waiting for my pony.

<\__~
//  \\
tombentley commented 11 years ago

@ikasiuk you've inspired me to write the ceylon pony command, which prints out an ASCII art pony, just for @gavinking

FroMage commented 11 years ago

+1000

gavinking commented 11 years ago

@tombentley. What color is the pony? I don't want a black pony like Ivo's. I want a white pony.

chochos commented 11 years ago

ceylon pony --color white

FroMage commented 11 years ago

This optimizes the most common case, and leaves the more exotic cases looking just as reasonable as before.

We have to take your word that creating collections with elements is the most common case. I've seen plenty of examples where we want to allocate collections before we get any data to stuff in it. Caches, or sets of things we want to avoid reprocessing, for example.

Declaring the initialiser with a default value of {} helps a bit, but default values are forgotten when the initialiser is used as a function (see #378), so we´ll get a function that needs initial data, which is equally limiting as a function that can't take initial data.

gavinking commented 11 years ago

@FroMage I don't get your point. If you don't have any values, it's as simple as:

sort {}
HashMap {}

sort { order=ascending; }
HashMap { initialCapacity=16; loadFactor=0.75; }

Or:

sort({})
HashMap({})

sort({}, ascending)
HashMap({}, 16, 0.75)

I can't see what's missing in that.

gavinking commented 11 years ago

Hell, using a defaulted parameter for the initial elements, I get it down to:

HashMap()

Indeed, I think this is much clearer what we had previously where it was difficult to distinguish switches from elements. Before we had:

HashMap(16, 0.75, 1->"hello", 2->"world")

Where it's difficult to see which are switches and which are elements, and if you specify any elements you have to specify all switches. Now we can write:

HashMap( {1->"hello", 2->"world"}, 16, 0.75 )
HashMap( {1->"hello", 2->"world"}, 16 )
HashMap( {1->"hello", 2->"world"} )
HashMap()

Or:

HashMap { initalCapacity=16; loadFactor=0.75; 1->"hello", 2->"world" }
HashMap { initalCapacity=16; 1->"hello", 2->"world" }
HashMap { loadFactor=0.75; 1->"hello", 2->"world" }
HashMap { 1->"hello", 2->"world" }
HashMap {}

This is much better, AFAICT.

gavinking commented 11 years ago

I just have this totally irrational feeling that we should have an abbreviation for Iterable<T>.

So when I first started writing/reading code with {Element&Object...} in it, I had my doubts. But now that my eyes have had a little time to get used to it, I think I really like it. Even if it's sort of a little cryptic at first, it definitely helps reduce the verbosity of all these function signatures that process Iterables. (And we're going to have many more of them now!) It definitely looks a lot better in the outline view in the IDE. I especially like that:

That is, the type declaration tells me exactly how to pass something to the function. That's very nice I think.

gavinking commented 11 years ago

@RossTate Sorry for not responding sooner to the following:

First, if you have the shorthand T[], then you should also have the shorthand T{}. Alternative you could have neither shorthand.

So honestly at this point I sorta feel like maybe [T...] is more regular and perhaps even the more "ceylonic" way to write it. But deep tradition in C-like languages strongly urges that we support the variant T[], I think.

And note that the postfix [] doesn't refer to instantiation with [x, y, z], rather it refers to the fact that you can obtain an indexed element using seq[i]. From that point of view, T{} is meaningless: there's no iter{i} operator for Iterable. So, sure, it would be damn convenient to be able to write Iterable<T> as just T{}, but I don't think it's a very intuitive syntax.

Now, there is one other possibility worth raising here. I've often over the past two years wondered if having T[] mean Sequential<T> is actually incorrect, and that T[] really should mean List<T>. There's four reasons for thinking this:

To this list of reasons we can now add:

On the other hand:

These are considerations for people learning the language. They're of course very unlikely to trip up people who know Ceylon well.

Oh, and, FTR, a further obsolete consideration was that:

So I'm uncertain what to make of this. There are really good arguments either way. Of course T[] meaning a sequence type is pretty ingrained in our heads by now, but I bet we would pretty quickly adjust to it meaning List. (Updating all existing Ceylon code and docs would be a major PITA, obviously, but that's not an argument to not do it.)

FroMage commented 11 years ago

Hell, using a defaulted parameter for the initial elements, I get it down to HashMap()

Yes but not if you use the initialiser as a method reference. Suddenly it becomes this mess of a method that wants all these pesky arguments…

The indexing operation applies to any List, not just Sequentials.

Hum, but… the indexing operation actually applies to Correspondence, no?

gavinking commented 11 years ago

I recommend adding at least the type {T} (as a subtype of {T...}) so that you can pass lazily evaluated expressions easily.

@RossTate That's a very interesting idea. But, damn, I really wanted to avoid biting off the whole subject of lazy evaluation at this point. (We need to deliver a 1.0 release.)

But it does raise a great question about the runtime semantics of { 1+1, 2**2 } (vs [ 1+1, 2**2 ]). We know what its type is, according to the original proposal above, but if { ... } means "lazy", then should { 1+1, 2**2 } evaluate its argument expressions lazily?

Honestly there are some really great reasons for thinking that it should. It might solve some problems that have been bugging me for ages. I'm going to have to think about this some more.

gavinking commented 11 years ago

Yes but not if you use the initialiser as a method reference.

Sure, but in light of #378, this is a solvable problem (using a special marker type like Default).

Hum, but… the indexing operation actually applies to Correspondence, no?

Indexing by an Integer, I mean.

Yeah, sure, it would be intellectually consistent to say:

but since List and Map occur much more commonly in ordinary code, it would be much more useful to say:

FroMage commented 11 years ago

but since List and Map occur much more commonly in ordinary code, it would be much more useful to say

but that means any implementation of Correspondence that is neither List nor Map is excluded.

FroMage commented 11 years ago

Sure, but in light of #378, this is a solvable problem (using a special marker type like Default).

If we solve it then count me as happy.

gavinking commented 11 years ago

I'm liking @RossTate's idea more and more. It's totally natural that things inside braces are lazily evaluated, since that's exactly what happens to things inside a normal code block. And among the problems that would be solved by this idea are:

We would totally solve the problem of string interpolation: there would no longer be any need for lazy interpolation, because you could just wrap the interpolated string in { "braces" } to get an instance of {String}.

This would be a simple, explicit solution to a problem that has been dogging us for a long time...

RossTate commented 11 years ago

I like Item[Key] being shorthand for Correspondence, with Integer being the default Key. I prefer that over Map and List because it's consistent with what we have so far: the shorthand for any type corresponds to the special syntax we have for that type.

gavinking commented 11 years ago

@RossTate The thing is that the type Correspondence is not really intended to ever appear in ordinary code. It's more of a technical interface that forms the glue between the language definition and library-provided types. Providing a special syntax for types that rarely occur is a counterproductive thing to do, since it becomes just an obscure thing that trips people up when they rarely come across it.

Where type abbreviations make sense is for types that commonly occur and that are otherwise quite verbose to write down and distracting to read. Things like function types are tuple types are perfect examples because they're otherwise extremely annoyingly verbose. Iterable and Entry are worth abbreviating not because they are so verbose but because they occur very often in our APIs and in user-written code. List and Map could also be great candidates for exactly the same reason.