eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
398 stars 62 forks source link

Tuple syntax #3539

Closed CeylonMigrationBot closed 8 years ago

CeylonMigrationBot commented 12 years ago

[@gavinking] The current syntax in the tuples prototype is the following:

Note that these are simply syntax sugar:

(i.e. there are no shitty Tuple0, Tuple1, Tuple3, etc, types hiding away.)

Tuple types

I think this syntax is reasonable, after expending a whole lot of thought on the issue over the last few days. I guess I might have preferred to be able to go with the syntax (Integer,Float,String) for a tuple type, but as @chochos noticed, this results in ambiguities when considered in combination with our annotation syntax.

An alternative solution would be to go with a more traditional-mathematics-like syntax, and write tuple types as Integer*Float*String and function types like Integer*Float*String=>Foo. If anyone thinks we should go down that path, please speak up now!

Destructuring tuples

The remaining issue is what should be the syntax for destructuring a tuple (i.e. parallel assignment). I think the following syntax, reflective of the syntax we use for parameter lists is most natural:

(Integer i, Float f, String s) = triple;

This syntax would also very naturally extend to pattern matching, if we ever add that. Imagine something like:

if ((Integer i, Float f, String s) = triple) { .... }

The downside is that it's a bit verbose when you just want to use type inference, which is probably the common case:

(value i, value f, value s) = triple;

I have to repeat value 3 times!

An alternative would be to move the value outside the parens:

value (i, f, s) = triple;

Then if you wanted to explicitly specify the types, you would write:

<Integer, Float, String> (i, f, s) = triple;

That's OK, I suppose, but it would not extend very nicely to pattern matching:

if (<Integer, Float, String> (i, f, s) = triple) { .... }

So I think the parameter-list-like syntax is a better fit to the rest of the language.

Accessing a tuple element

What if, instead of destructuring the tuple, you just want to directly access elements of the tuple? Right now, you can write:

value first = triple.first;
value second = triple.rest.first;
value third = triple.rest.rest.first;

We could write some helper methods to access the 1st, 2nd, and 3rd elements of a tuple, without loss of typesafety:

value first = first(triple);
value second = second(triple);
value third = third(triple);

Alternatively, we could build something into the language, letting you write:

value first = triple.1;
value second = triple.2;
value third = triple.3;

Or whatever. My guess is we probably don't need this last feature.

Thoughts?

[Migrated from ceylon/ceylon-spec#433] [Closed at 2012-11-16 20:50:07]

CeylonMigrationBot commented 11 years ago

[@chochos] Actually Integer*Float*String doesn't look bad at all. But functions types like Integer*String=>Foo looks too Scala-ish. If using * for tuples means using => for function types then I'd rather stick with Callable<ReturnType,<Param1,Param2,Param3>>.

As for destructuring: would it be possible to do a mix so we could write value(a,b,c)=triplet for type inference, and use (A a, B b, C c)=triplet for specifying types? And obviously to use a mix you'd have to use value individually, i.e. (A b, value b, C c)=triplet.

And for elements, maybe Tuple could simply implement item(Integer index) so you can do

value first = tuple[1];
value second = tuple[2];
value third = tuple[3];

(we'd just need to decide whether the first element has index 0 or 1).

CeylonMigrationBot commented 11 years ago

[@gavinking]

Actually Integer*Float*String doesn't look bad at all.

Nope, it's not bad. But it feels a little foreign to the notation used in the rest of the language.

But functions types like Integer*String=>Foo looks too Scala-ish.

Well, I dunno about "Scala-ish", but I agree that it's a bit foreign, and perhaps even cryptic. Potentially we could go with

Foo(Integer*String)

The problem with that is it's neither here nor there: it's not the traditional syntax in mathematics, nor is it really very closely reflective of the syntax we use for declaring functions. It would be very internally consistent, however. Perhaps I could get used to it.

I do recognize that the following is more visually consistent:

Than this:

CeylonMigrationBot commented 11 years ago

[@gavinking]

As for destructuring: would it be possible to do a mix [snip]

I would really like to avoid that

And for elements, maybe Tuple could simply implement item(Integer index) so you can do [snip]

Well the idea is that this is supposed to be typesafe. i.e. writing triple.5 is a compile-time error.

CeylonMigrationBot commented 11 years ago

[@gavinking] FTR, to guide my intuition about tuple type vs function type notation, I'm thinking about functions as sets of tuples, i.e. Foo(Integer,String) is a shorthand for Set<<<Integer,String>,Foo>>.

CeylonMigrationBot commented 11 years ago

[@gavinking] Well, which do you guys prefer:

shared <T[],T[]> partition(Boolean by(T t), T... ts) { ... }

Or:

shared T[]*T[] partition(Boolean by(T t), T... ts) { ... }

I honestly could live with either.

CeylonMigrationBot commented 11 years ago

[@gavinking] I think, upon reflection, that if we go with the syntax Integer*Float*String for tuple types, there's really no strong reason we can't continue to use Foo(Integer,String) as the abbreviation for Callable<Foo,Integer*String>. I think I'm fussing over nothing. The truth is that the type Integer*Float*String is the return type of the invocation (1, 0.0, "hello")—it's not actually directly analogous to the parameter types of a function reference. So there's no reason why parameter type lists have to look visually similar to tuple types.

CeylonMigrationBot commented 11 years ago

[@quintesse] I definitely prefer the more traditional <A, B> approach. Especially if we can make the type of a tuple use () then at least have something that looks visually similar: (1, "foo", 3.0) with a type of <Integer, String, Float> instead of Integer*String*Float.

CeylonMigrationBot commented 11 years ago

[@chochos] A*B*C is more in line with union and intersection types, but ubfortunately there wouldn't be any sugar for Unit, whereas with <A*B*C> you can simply type <>.

CeylonMigrationBot commented 11 years ago

[@quintesse] I'm not so sure about your reasoning, A|B|C and A&B&C still refer to a single type, the end result is a single type, A*B*C are 3 separate types, not something that is very obvious from looking at the *. In fact people might assume that like unions and intersections it refers to yet another type operator that results in a single type. With , we're used to seeing an enumeration and that's what this is IMO.

CeylonMigrationBot commented 11 years ago

[@gavinking]

I definitely prefer the more traditional <A, B> approach.

FTR, <A,B> is definitely not a traditional notation. The traditional notation is A×B, which is written as A*B in Standard ML, Ocaml, and other languages. The more recent languages Haskell and I think Scala use (A,B), but that is not an option which is open to us, unfortunately.

CeylonMigrationBot commented 11 years ago

[@gavinking]

unfortunately there wouldn't be any sugar for Unit, whereas with <A,B,C> you can simply type <>.

I don't see this as an issue, since AFAICT there is basically no reason to ever write down the unit type in Ceylon. Unlike ML, Haskell, Scala, etc, our void functions aren't of type Unit, they are of type Void.

It might be a bigger issue that there is no way to write down a singleton type using the A*B*C notation—you can't write <A>. But again, there aren't many uses for singleton types.

Note that our constructor syntax suffers from the same problem: (a) does not construct a singleton.

CeylonMigrationBot commented 11 years ago

[@gavinking]

I'm not so sure about your reasoning, A|B|C and A&B&C still refer to a single type, the end result is a single type, A*B*C are 3 separate types

Huh? What can you possibly mean by this?

In fact people might assume that like unions and intersections it refers to yet another type operator that results in a single type.

Um, that's exactly what it is...

CeylonMigrationBot commented 11 years ago

[@ikasiuk] Regardless of tradition, <A, B> does seem more intuitive in the context of the existing Ceylon syntax. And if a tuple looks like a function argument list then why shouldn't a tuple type look like a type argument list? Apart from that I have no strong objections against A*B either.

CeylonMigrationBot commented 11 years ago

[@gavinking]

And if a tuple looks like a function argument list then why shouldn't a tuple type look like a type argument list?

Well, yes, I found this intuitive initially, but then after waaay overthinking it, I realized that there is a difference between the type of a tuple and a tuple of types. Nevertheless, I don't imagine that anyone will find <A,B,C> unnatural.

CeylonMigrationBot commented 11 years ago

[@gavinking]

I don't see this as an issue, since AFAICT there is basically no reason to ever write down the unit type in Ceylon.

Ah, that's nonsense. Metamodel.

We need to be able to write stuff like Class<Person,<Name>> and Class<Timer,<>>. I admit that Class<Person,Singleton<Name>> and Class<Timer,Unit> are significantly worse.

So that's a really strong argument against A*B.

CeylonMigrationBot commented 11 years ago

[@quintesse]

FTR, <A,B> is definitely not a traditional notation.

Traditional from out POV, the Ceylon language, which in a lot of its syntax is somewhat similar to the Java language. That tradition.

Huh? What can you possibly mean by this?

I must be speaking Chinese then. Try and understand my layman's thought processes please. What I mean is that regardless of A, or A|B or A&B you've got this idea of a single "thing" that has either type a fixed type "A", or it can be "A or B" or in the intersection it has to be both "A and B", but in all these cases we're still referring to this singular "thing".

But in the case of the tuple we're talking about a sequence of types, I know that of course it's a single type in our type system, but it can be deconstructed into it's component types. And when instantiated there will be component objects with the corresponding types. And in Ceylon sequences of things tend to be delimited by , , not by *.

So I'm saying that this "traditional" syntax will likely resonate better with people who have only basic knowledge about type systems and know nothing of ML or Ocaml because it suggests an enumeration of something.

CeylonMigrationBot commented 11 years ago

[@RossTate] I definitely think <...> is the way to go over * both because it handles the empty and singleton cases well and because it bypasses associativity and precedence ambiguities.

R(A,B,C) should be shorthand for Callable<R,Tuple<A,B,C>> since a R foo(A a, B b, C c) has the latter form and the former type.

You should definitely not do first and rest or first and second approaches. The former leaks implementation details, and honestly I don't think that's a good way to implement them. The latter will just suck when you get to larger and larger tuples, though there's no harm in having it in addition (though I think as methods would be better).

You should do .0 and .1. I start with .0 for reasons I'll get to in a sec.

I think tuples should be a special case of lists. That is, a Tuple<A,B,C> should implement List<A|B|C>. Then .0 is like [0] except that it doesn't type check if the index is too large and it has the more precise type for that index. Also, {} is a <>, {"hello"} is a <String> (which solves the singleton-constructor problem), and so on.

CeylonMigrationBot commented 11 years ago

[@gavinking]

and because it bypasses associativity and precedence ambiguities.

Yeah, it's true that * is not truly associative.

R(A,B,C) should be shorthand for Callable<R,Tuple<A,B,C>> since a R foo(A a, B b, C c) has the latter form and the former type.

Right, that's the idea.

The latter will just suck when you get to larger and larger tuples

Hopefully we don't get people using "larger and larger" tuples. If we do, then I'll definitely regret adding tuples to the language :-/

honestly I don't think that's a good way to implement them.

I think it's a perfectly fine approach, as long as people don't start using "larger and larger" tuples.

I start with .0 for reasons I'll get to in a sec.

Sorry, that was a mistake above. Of course we should start from 0.

I think tuples should be a special case of lists.

I would like that too, but so far I'm having problems making it work out within the type system, for the same reason mentioned in #3540: I can't write down the type "a tuple of Strings" in Ceylon. Therefore I can't express the relationship that "a tuple of Ts" is a List<T>. This is a problem that needs solving.

CeylonMigrationBot commented 11 years ago

[@chochos] But isn't R(A,B,C) calling the constructor of R with those types as parameters?

CeylonMigrationBot commented 11 years ago

[@gavinking] @chochos No, because this is a type abbreviation that can only occur in certain places grammatically. It can't occur in an expression.

CeylonMigrationBot commented 11 years ago

[@gavinking]

No, because this is a type abbreviation that can only occur in certain places grammatically. It can't occur in an expression.

Of course, that could still be confusing to the human reader, if not to the parser. When we discussed this previously, we decided to go with this syntax tentatively, and see whether people found it confusing. So far, it does not seem to have caused problems, but it's not something I'm totally in love with.

CeylonMigrationBot commented 11 years ago

[@RossTate]

I would like that too, but so far I'm having problems making it work out within the type system, for the same reason mentioned in #3540: I can't write down the type "a tuple of Strings" in Ceylon. Therefore I can't express the relationship that "a tuple of Ts" is a List. This is a problem that needs solving.

What do you mean by "you can't express the relationship that a tuple of Ts is a List"? Surely the type system can have a rule that a Tuple<A,B,C...> extends List<A|B|C...>. If you mean an arbitrary programmer can't express this, that's a different problem, but that's cuz tuples are a bit of a special case already in the they take an arbitrary list of type arguments.

CeylonMigrationBot commented 11 years ago

[@gavinking] Tipples aren't a special case at all, since I've encoded them into the type system using recursion. Unfortunately I don't have machinery for expressing recursive type constraints :(

CeylonMigrationBot commented 11 years ago

[@RossTate] Do you mean using something like:

class TupleOf<Element>
      of Empty | Cons<Element,Element,TupleOf<Element>>
      extends List<Element> {...}
class Empty
      extends TupleOf<Bottom> {...}
class Cons<Element,First,Rest>
      extends TupleOf<Element>
      given First extends Element, Rest extends TupleOf<Element> {...}
CeylonMigrationBot commented 11 years ago

[@chochos] Tipples? So if we go with the A<T<X>> syntax, we'll have pointy tipples?

CeylonMigrationBot commented 11 years ago

[@FroMage] Can we get Nipples instead?

CeylonMigrationBot commented 11 years ago

[@luolong]

(Integer i, Float f, String s) = triple;

This syntax would also very naturally extend to pattern matching, if we ever add that. Imagine something like:

if ((Integer i, Float f, String s) = triple) { .... }

I definitely like this syntax a lot as this would be much mure regular than the alternatives.

The downside is that it's a bit verbose when you just want to use type inference, which is probably the common case:

(value i, value f, value s) = triple;

I have to repeat value 3 times!

I see no problem there. That is how I define values now and it extends easily and naturally into the tuple syntax.

If the repetition seems excessive, you could have a shortcut syntax like this:

(value i, f, s)

Or even a syntax like this:

(String name, surname, Integer age)
CeylonMigrationBot commented 11 years ago

[@FroMage] Java and c allow 'String c = "foo", d = "bar"' so it does not shock me

CeylonMigrationBot commented 11 years ago

[@gavinking] So, thinking about the third issue, now that tuples are sequences, I think that we should simply make the typechecker smart enough to recognize the following:

<Float,Decimal> xy = .... ;
Float x = xy[0];
Decimal y = xy[1];

i.e. for tuple types, if I have an index operator with a literal integer index, then it will use the tuple element type for that index, rather than the sequence element type. This is pretty easy to implement, and totally intuitive.

CeylonMigrationBot commented 11 years ago

[@quintesse] Very nice, I'm liking this more and more!

-Tako

On Fri, Nov 2, 2012 at 11:21 AM, Gavin King notifications@github.comwrote:

So, thinking about it, now that tuples are sequences, I think that we should simply make the typechecker smart enough to recognize the following:

<Float,Decimal> xy = .... ; Float x = xy[0]; Decimal y = xy[1];

i.e. for tuple types, if I have an index operator with a literal integer index, then it will use the tuple element type for that index, rather than the sequence element type. This is pretty easy to implement, and totally intuitive.

— Reply to this email directly or view it on GitHub<#3539#issuecomment-10009999>.

CeylonMigrationBot commented 11 years ago

[@chochos] That would kick ass.

CeylonMigrationBot commented 11 years ago

[@gavinking] There's one little subtlety with that idea. Consider:

<Float,Decimal> xy = .... ;
value z = xy[2];

I suppose the inferred type of z should be Nothing. Well, that's okay I suppose, since at least the following would result in an error:

<Float,Decimal> xy = .... ;
Float z = xy[2];

But I must admit I would sort-of like it if xy[2] were an error—but that's not consistent with the behavior of the operator for non-literal indices. Well, not a really bug problem I guess...

CeylonMigrationBot commented 11 years ago

[@quintesse] I'd rather go with the error, after all you can statically prove that it's wrong.

-Tako

On Fri, Nov 2, 2012 at 2:50 PM, Gavin King notifications@github.com wrote:

There's one little subtlety with that idea. Consider:

<Float,Decimal> xy = .... ; value z = xy[2];

I suppose the inferred type of z should be Nothing. Well, that's *okay

  • I suppose, since at least the following would result in an error:

<Float,Decimal> xy = .... ; Float z = xy[2];

But I must admit I would sort-of like it if xy[2] were an error—but that's not consistent with the behavior of the operator for non-literal indices. Well, not a really bug problem I guess...

— Reply to this email directly or view it on GitHub<#3539#issuecomment-10014818>.

CeylonMigrationBot commented 11 years ago

[@RossTate] The problem with a compiler error is that you'll get inconsistent behavior. For example:

Integer i = 2;
Float z = xy[i];

would be valid, but if you make an optimization pass or something, then the optimized version

Float z = xy[2];

would be rejected by the compiler. This is especially concerning because transforming the program in a good way, meaning simpler and more informative, leads to a bad change. So, I don't like using an error, though a warning would be fine.

CeylonMigrationBot commented 11 years ago

[@chochos] But only constant values should be allowed, after all it might turn out to just be sugar for calling rest n-1 times then calling first...

CeylonMigrationBot commented 11 years ago

[@ikasiuk] If I understand correctly then we can't allow only constant values because a tuple is a sequence, isn't it? So you have to be able to use it like a sequence.

CeylonMigrationBot commented 11 years ago

[@gavinking] Right, what Ross said. On Nov 2, 2012 3:15 PM, "Ross Tate" notifications@github.com wrote:

The problem with a compiler error is that you'll get inconsistent behavior. For example:

Integer i = 2; Float z = xy[i];

would be valid, but if you make an optimization pass or something, then the optimized version

Float z = xy[2];

would be rejected by the compiler. This is especially concerning because transforming the program in a good way, meaning simpler and more informative, leads to a bad change. So, I don't like using an error, though a warning would be fine.

— Reply to this email directly or view it on GitHub<#3539#issuecomment-10015386>.

CeylonMigrationBot commented 11 years ago

[@gavinking] I have implemented the suggested approach to tuple element access, i.e.

Integer i = ("hello", 0, 1.3)[1];
CeylonMigrationBot commented 11 years ago

[@tombentley] Would it be possible that we could do the same thing for ranges? In other words in

value y = ("hello", 0, 1.3)[0..1];

we infer the type of y to be <String, Integer>, rather than Empty|Sequence<String|Integer|Float>. People might find it counter intuitive that the typechecker can figure it out for single elements, but not for ranges.

CeylonMigrationBot commented 11 years ago

[@gavinking] Tom, this is on my todo list.

On Mon, Nov 12, 2012 at 11:04 AM, Tom Bentley notifications@github.comwrote:

Would it be possible that we could do the same thing for ranges? In other words in

value y = ("hello", 0, 1.3)[0..1];

we infer the type of y to be <String, Integer>, rather than Empty|Sequence<String|Integer|Float>. People might find it counter intuitive that the typechecker can figure it out for single elements, but not for ranges.

— Reply to this email directly or view it on GitHub<#3539#issuecomment-10282046>.

Gavin King gavin.king@gmail.com http://in.relation.to/Bloggers/Gavin http://ceylon-lang.org http://hibernate.org http://seamframework.org

CeylonMigrationBot commented 11 years ago

[@gavinking] How would you guys feel about not addressing tuple destructuring in Ceylon 1.0, and leaving it for later?

CeylonMigrationBot commented 11 years ago

[@tombentley] Personally I think I could live with that, since tuples support indexing.

CeylonMigrationBot commented 11 years ago

[@quintesse] No problem

CeylonMigrationBot commented 11 years ago

[@gavinking] Then I will implement support for tuple subranges and then close this.

CeylonMigrationBot commented 11 years ago

[@gavinking] I have substantially implemented tuple ranges in @a3f70ad. I will open a new issue for my remaining questions about that.