Tuples - Githubissues

gavinking commented 12 years ago

I'm raising this issue here because it keeps coming up, and we need a permanent location to keep the discussion, instead of me having to re-explain my thinking on this issue every time. Nevertheless, this is an issue that we want to hear feedback on, even though I'm pretty sure I already know what the feedback will be.

Currently the #1 feature request for the language is support for tuples. I'm personally a skeptic, though I recognize that this is an argument I will likely lose. Let me explain why I'm a skeptic.

The thing is, almost every usecase people show for why tuples are sooo convenient, fall into one or more of the following possible categories:

a union type is much more appropriate, or
an intersection type or mixin inheritance is more appropriate, or
it's anyway a tuple of length 2 (a Pair), or
it's a 3-d coordinate, a tuple of length 3.

We don't need tuples to abstract over pairs. Indeed, we already have Entry, and the spec proposes a destructuring syntax for it. It would be very easy to slightly generalize Entry to Pair. Nor do we really need tuple support to represent coordinates in 3-dimensional space.

And a whole class of usecases for tuples, example here, vanishes now we have union types.

Finally, when the usecase doesn't fit into one of these categories, it is usually an attempt to work around the nominal type system and get a poor mans structural typing. Personally, I think creating a language like Ceylon based on structural typing with aliases rather than nominal typing would be a really interesting exercise, but that's not this language.

On the other hand, arguing against myself, as usual, there is one usecase that I have run into:

access to data in a relational database, or some other kind of information system, with a "foreign" type system.

This usecase, in and of itself, might be sufficient to justify tuples, I'm not sure.

Finally, it's worth noting that because Ceylon has sequenced type parameters, representing tuples within our type system is actually really quite straightforward. What bothers me is not tuples themselves, but all the extra syntax sugar you need to introduce to make them actually useful:

a nice way to construct them (name, age, address),
a nice way to write down their type (String, Natural, Address), and
destructuring syntax value name, age, address = ......

Anyway, whatever we decide, I insist that this should be something we address after Ceylon 1.0.

RossTate commented 12 years ago

Don't forget pattern matching and iteration! ceylon/ceylon-spec#65

gavinking commented 12 years ago

@RossTate, I'm calling pattern matching "destructuring". Yes, I should have mentioned iteration as in:

for ((name, age, address) in tuples) { ... }

or perhaps simply:

for (name, age, address in tuples) { ... }

RossTate commented 12 years ago

I meant as a way to do basic iteration.

quintesse commented 12 years ago

Related discussion on the mailing list: https://groups.google.com/group/ceylon-dev/browse_thread/thread/a6be9ba599eb7d1b

FroMage commented 12 years ago

I thought we had already discussed this a while ago and proposed to generalise Entry to n-tuple. Even without performance boosts, this would be useful, as I've occasionally needed to return more than one value from a method, and required a dumb C-structure just to group the two as more type-safe alternative than Object[]. Now admittedly this happens what, 2 times per year?

quintesse commented 12 years ago

Well it happened to me yesterday that I wanted a function to return both a generated expression and it's type (JCExpression, ProducedType). Useful in that particular case, but not really worth it to make a special class for it and Object[] is icky. So you just do it another way, no problem, but it sure would be nice to have :)

-Tako

2011/12/2 Stéphane Épardaud < reply@reply.github.com

I thought we had already discussed this a while ago and proposed to generalise Entry to n-tuple. Even without performance boosts, this would be useful, as I've occasionally needed to return more than one value from a method, and required a dumb C-structure just to group the two as more type-safe alternative than Object[]. Now admittedly this happens what, 2 times per year?

Reply to this email directly or view it on GitHub: https://github.com/ceylon/ceylon-spec/issues/81#issuecomment-2988332

gavinking commented 12 years ago

@FroMage

Now admittedly this happens what, 2 times per year?

@quintesse

So you just do it another way, no problem, but it sure would be nice to have :)

The question is: is this really where we want to spend our "complexity budget"?

My answer would be no.

FroMage commented 12 years ago

Indeed no IMO, and we can add it later.

quintesse commented 12 years ago

On Fri, Dec 2, 2011 at 15:26, Gavin King < reply@reply.github.com

wrote:

Now admittedly this happens what, 2 times per year?

So you just do it another way, no problem, but it sure would be nice to have :)

The question is: is this really where we want to spend our "complexity budget"? My answer would be no.

Well if you put it that way I would need to know what the budget is and what other features are competing :)

-Tako

luxspes commented 12 years ago

I would love to see support for relvars as a basis for tuples, so that Ceylon could have "first class relational algebra support" as the one defined in the Third Manifesto ( http://www.thethirdmanifesto.com/ ) but I guess I am asking for too much...

RossTate commented 12 years ago

@luxspes

Relational algebra is usually phrased in terms of records (a variant of tuples where all components are named and typically unordered). From what I could tell, a relvar (short for relational variable) is a variable representing a (database) relation (which is a finite set of instances of some record type, sometimes with repeats). If I understand relvars correctly, in OO languages these are typically represented by something like an Iterator or Collection of a record type, and often handled with stuff like sequence comprehensions (#82). (Actually, come to think of it, @gavinking would know more about this since this sounds like Hibernate stuff.)

So, I think what you're asking for more specifically is first support for (anonymous) records as opposed to the unnamed ordered tuples we're discussing here (or named records which are simply classes). Second you'd want polymorphic constructs for working with record types, such as select, project, and join. Is that right?

I can say that type inference for join will be difficult since record-type intersection is not very compatible with the usual record-type subtyping. Nonetheless, as you suggest, this kind of tuple is also worth investigating.

luxspes commented 12 years ago

@RossTate

Mainly I'd want to constructs for working with "record types", such as select, project, and join.

I would love to be able to do something like this:

RELATION {
        TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName  'Friend'},
        TUPLE {SourceUserName 'Jane', TargetUserName 'Alfred', RelationshipName  'Enemy'}
 } where SourceUserName = 'Joseph'

and get:

RELATION {
        TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelatinshipName  'Friend'},
 }

And be able to assing this kind of relational expressions to "relvars":

variable Relvar(SourceUserName:String , TargetUserName:String , RelationshipName:String)  r  = RELATION {
        TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName  'Friend'},
 }

And the be able to do things like:

variable Relvar(Name:String)  users =  RELATION {
        TUPLE {Name 'Juan'},
        TUPLE {Name 'Pedro'},
        TUPLE {Name 'Alberto'},
        TUPLE {Name 'Jose'},
        TUPLE {Name 'Alfredo'},
        TUPLE {Name 'Roberto'}
 }

variable Relvar(SourceUserName:String , TargetUserName:String , RelationshipName:String)  relationships
RELATION {
        TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName  'Friend'},
        TUPLE {SourceUserName 'Jane', TargetUserName 'Alfred', RelationshipName  'Enemy'}
 }

And then do something like:

variable Relvar(SourceUserName:String , TargetUserName:String , RelationshipName:String)  usersWithRelationships  =  users RENAME (Name AS SourceUserName) JOIN relationships

RossTate commented 12 years ago

Given some of the proposals going around, here's how I imagine your example might go.

First, let's suppose we add anonymous record types such as Record<SourceUserName:String,TargetUserName:String,RelationshipName:String> possibly with specialized syntax {SourceUserName:String,TargetUserName:String,RelationshipName:String}, constructed with the syntax {SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"}.

Then using the other proposals already in place we could do your example with:

value rel
    = {for rec in {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"},
                   {SourceUserName="Jane", TargetUserName="Alfred", RelationshipName="Enemy"}}
       if rec.SourceUserName == "Joseph"
       rec};

and rel would have inferred type {SourceUserName:String,TargetUserName:String,RelationshipName:String}[] and essentially have value {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"}}.

Does that seem satisfactory?

luxspes commented 12 years ago

@RossTate

Lots of really interesting stuff that are currently really hard to do with SQL and its derivates (like JPAQL) or even its "antagonists" (such as UnQL) could get simplified with real support for relational operations in a language.

For example, in SQL-like languages if one has a "table" with a,b,c,d,e,f,g,h,i, fields and one only wants to rename "a" as "X" in SQL like languages one has to write this:

SELECT DISTINCT  a as X,b,c,d,e,f,g,h,i FROM someTable

while with true relational support it would be possible to simply write:

someTable RENAME ( a AS x)

Even modern stuff such as C# LINQ forces you to have to write:

var query = from s in db.SomeTable
            select new
            {
                X = u.a,
                c = u.c,
                d = u.d,
                e = u.e,
                f = u.f,
                g = u.g,
                h = u.h,
                i = u.i,
            };

The solution proposed by TheThirdManifesto is at a glance far more elegant and more expressive.

Another example, if we want to rename column "a" as "X", remove column "b" y and add a new column "z" with the value g+h we in SQL-like languages we would write:

SELECT DISTINCT a as X,c,d,e,f,g,h,i, g+h as z from someTable

While in a language with true relational support we could write:

someTable RENAME (a AS X) REMOVE (b) ADD (g+h z)

Which prevents us from rewriting those columns we do not wish to affect, and makes it explicit that we are removing the "b" column.

luxspes commented 12 years ago

@RossTate

I really like your example with value rel = {for rec in... that gives us something like relational selection... now... how can we add projection, column renames, joins, and so on?

The one thing I do not like about using the "for" keyword for this, is that it makes me think that the selection operation is going to happen sequentially when in fact this kind of operation could happen in parallel.

And finally if we could access the AST of this relational expressions and send them to JDBC for processing.... we would have transparent integration with relational databases... the only thing missing would be stored procedures and user defined functions (and even those could, if annotated to be compiled as such, using the AST, be translated to PL/SQL or TSQL or PgSQL and sent for storage inside the database....

RossTate commented 12 years ago

Ah, those are good examples. Again, though, they more specifically have to do with anonymous records than with relational algebra in general. In particular one can add rec.-name and rec.+name=value to get the functionality you're looking for.

For example, {for row in someTable (row.+X=row.a).-a.-b.+z=row.g+row.h} would implement your last example. The syntax is a bit wordier than your example, but that's because your example is extremely context sensitive with a lot of implicit capturing. For example, if there were some previously defined variable g, then it's not clear whether the g+h expression should reference the previously defined g or the column g. I don't want to go into this here, but choosing the latter (as your example implicitly does) makes the code fragile to external changes.

Unfortunately, these operations make polymorphism of records rather difficult, so we'd have to consider them carefully.

As for the for keyword, I don't think we're worrying too much about parallelization in these early stages, but I don't think it would be too hard to address when its time comes up.

To summarize, it would be good to consider record types (as well as tuple types), especially due to its importance in databases. Nonetheless, we should move this discussion elsewhere (and elsewhen?). I'll leave @gavinking to choose when and where, since I've already made enough of a mess of these forums/issues.

luxspes commented 12 years ago

Interesting solution this {for row in someTable (row.+X=row.a).-a.-b.+z=row.g+row.h)}... I somewhat feel right now that operators make it more concise but might have a negative impact on legibility...

BTW, really enjoyed this discussion... but I agree, this is not the right when/where for this discussion... good night...

gavinking commented 11 years ago

FTR, I'm now inclined to think that if we ever did get convinced that we need tuples—and I'm definitely still unconvinced of this—then a more convenient/consistent syntax would be to eliminate the parens, like this:

Float, Float polar(Float x, Float y) {
    return sqrt(sqr(x)+sqr(y)), atan(y/x);
}

Float r, Float theta = polar(x, y);

instead of this:

(Float, Float) polar(Float x, Float y) {
    return (sqrt(sqr(x)+sqr(y)), atan(y/x));
}

Float r, Float theta = polar(x, y);

The parens don't add any real value, they don't even help make the syntax less ambiguous.

RossTate commented 11 years ago

You do need the parens for disambiguation, especially if we add forms of overloading such as either variatic generics or default type arguments. For example, if we have class C<X, Y=X> extends Map<X,Y>, then is C<Float,Float> a map from floats to floats or from pairs of floats to pairs of floats?

gavinking commented 11 years ago

@RossTate No, you just need to have the interpretation that X,Y in Map<X,Y> actually is a pair, just like you already need to have the interpretation that x,y in foo(x,y) is a pair. Which is fine, because from a mathematical point of view a function with multiple parameters is just a function which accepts a tuple.

RossTate commented 11 years ago

I think you misunderstood me; I just chose Map because it has two type parameters. I meant that, with your syntax, it's not clear if C<Float,Float> is C fully instantiated or C partially instantiated (i.e. C<(Float,Float)>) and then the second type argument defaulting to the first, i.e. (Float,Float).

gavinking commented 11 years ago

I understood you. What I'm saying is that it means the first, just like foo(1.0, 2.0) would mean the first interpretation in the analogous situation.

RossTate commented 11 years ago

There is no analogous situation. Our syntax for method calls is unambiguous. You are just arbitrarily saying the first interpretation is the correct interpretation. Suppose I wrote Iterable<Float, Float>. Given the semantics of Iterable, it's pretty clear that this is supposed to be an iterable of pairs of floats, yet given your arbitrary choice you'd get an iterable of floats whose next method returns Float|Float. In fact, many users of Iterable probably won't even be very aware of the extra defaulted arguments, just like users of C++'s stdlib. So, I don't think Float, Float should be the syntax for a type. If you mean that we can drop the parens in the special case of returning a tuple, then that's a different matter though.

gavinking commented 11 years ago

@RossTate you're not listening to me.

Iterable<Float,Float> would be a type error, just like print(1.0, 2.0) would be. If you wanted to have an iterable object containing pairs, you would write Iterable<<Float,Float>> using the < and > for grouping, just like you have to write print((1.0, 2.0)) to pass a pair to print(). It's precisely analogous.

There is always an ambiguity between do the parens mean grouping or do they mean tuple instantiation. The best way to resolve that ambiguity, IMO, is to say that parens always mean grouping, and sticking commas between expressions always means tuple instantiation at the conceptual level (even when it's a function call). Likewise, at the type level, angle brackets always mean grouping, and commas are used to form tuple types.

This is totally conceptually clean and regular and grammatically unambiguous.

RossTate commented 11 years ago

Now I am confused. You just said to get rid of the parens for tuple types. Now you're saying not to. Does this mean that you're saying we can drop the parens for tuple types only in the special case of a method's return type?

gavinking commented 11 years ago

No. That's not what I mean.

value x = 1, 2;

means x is a pair.

foo(1,2);

means pass the pair 1, 2 to the method foo(Integer, Integer).

Of course you would also be able to write:

foo(x);

To pass the pair x to foo().

i.e. every method always has exactly one parameter, of type Tuple.

RossTate commented 11 years ago

First, to clarify, how do you denote the "type" of x, and is it a full-fledged type?

Also, are you saying "first, second", without parentheses, is how one constructs a pair of values in general?

gavinking commented 11 years ago

First, to clarify, how do you denote the "type" of x

Integer, Integer

Which is an abbreviation of Tuple<Integer,Integer>.

Is it a full-fledged type?

Yes, of course.

Also, are you saying "first, second", without parentheses, is how one constructs a pair of values in general?

Right. But of course this is at a highly conceptual level. We wouldn't really model all methods as methods with one parameter. That's just how we think about it in terms of designing the syntax.

gavinking commented 11 years ago

i.e. Integer,Integer is an abbreviation of Tuple<Integer,Integer> just like 1,2 is an abbreviation of Tuple(1,2).

RossTate commented 11 years ago

Okay, so from your example it sounds like you expect foo(1, 2); and value x = 1, 2; foo(x); to do the same thing. You also say that Integer in Iterable<Float,Integer> should override the default. Then consider the following definition of foo:

Boolean foo(Object a, Object b=a) { return a === b; }

If we're consistent, then 2 in foo(1, 2) should override the default. Thus foo(1, 2) returns false whereas foo(x) returns true, since it's shorthand for foo(x, x), which is contradictory to your expectations.

gavinking commented 11 years ago

I now have a working prototype for tuple support, in the tuples branch of ceylon-spec and ceylon.language. In this prototype, tuples are just syntax sugar for the perfectly ordinary types Unit and Tuple in ceylon.language.

Therefore, I'm closing this issue, and moving discussion to #433 and #434.

ceylon / ceylon-spec

Tuples #81