Closed gavinking closed 11 years ago
Don't forget pattern matching and iteration! ceylon/ceylon-spec#65
@RossTate, I'm calling pattern matching "destructuring". Yes, I should have mentioned iteration as in:
for ((name, age, address) in tuples) { ... }
or perhaps simply:
for (name, age, address in tuples) { ... }
I meant as a way to do basic iteration.
Related discussion on the mailing list: https://groups.google.com/group/ceylon-dev/browse_thread/thread/a6be9ba599eb7d1b
I thought we had already discussed this a while ago and proposed to generalise Entry
to n-tuple. Even without performance boosts, this would be useful, as I've occasionally needed to return more than one value from a method, and required a dumb C-structure
just to group the two as more type-safe alternative than Object[]
. Now admittedly this happens what, 2 times per year?
Well it happened to me yesterday that I wanted a function to return both a generated expression and it's type (JCExpression, ProducedType). Useful in that particular case, but not really worth it to make a special class for it and Object[] is icky. So you just do it another way, no problem, but it sure would be nice to have :)
-Tako
2011/12/2 Stéphane Épardaud < reply@reply.github.com
I thought we had already discussed this a while ago and proposed to generalise
Entry
to n-tuple. Even without performance boosts, this would be useful, as I've occasionally needed to return more than one value from a method, and required a dumbC-structure
just to group the two as more type-safe alternative thanObject[]
. Now admittedly this happens what, 2 times per year?
Reply to this email directly or view it on GitHub: https://github.com/ceylon/ceylon-spec/issues/81#issuecomment-2988332
@FroMage
Now admittedly this happens what, 2 times per year?
@quintesse
So you just do it another way, no problem, but it sure would be nice to have :)
The question is: is this really where we want to spend our "complexity budget"?
My answer would be no.
Indeed no IMO, and we can add it later.
On Fri, Dec 2, 2011 at 15:26, Gavin King < reply@reply.github.com
wrote:
Now admittedly this happens what, 2 times per year?
So you just do it another way, no problem, but it sure would be nice to have :)
The question is: is this really where we want to spend our "complexity budget"? My answer would be no.
Well if you put it that way I would need to know what the budget is and what other features are competing :)
-Tako
I would love to see support for relvars as a basis for tuples, so that Ceylon could have "first class relational algebra support" as the one defined in the Third Manifesto ( http://www.thethirdmanifesto.com/ ) but I guess I am asking for too much...
@luxspes
Relational algebra is usually phrased in terms of records (a variant of tuples where all components are named and typically unordered). From what I could tell, a relvar (short for relational variable) is a variable representing a (database) relation (which is a finite set of instances of some record type, sometimes with repeats). If I understand relvars correctly, in OO languages these are typically represented by something like an Iterator or Collection of a record type, and often handled with stuff like sequence comprehensions (#82). (Actually, come to think of it, @gavinking would know more about this since this sounds like Hibernate stuff.)
So, I think what you're asking for more specifically is first support for (anonymous) records as opposed to the unnamed ordered tuples we're discussing here (or named records which are simply classes). Second you'd want polymorphic constructs for working with record types, such as select, project, and join. Is that right?
I can say that type inference for join will be difficult since record-type intersection is not very compatible with the usual record-type subtyping. Nonetheless, as you suggest, this kind of tuple is also worth investigating.
@RossTate
Mainly I'd want to constructs for working with "record types", such as select, project, and join.
I would love to be able to do something like this:
RELATION {
TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName 'Friend'},
TUPLE {SourceUserName 'Jane', TargetUserName 'Alfred', RelationshipName 'Enemy'}
} where SourceUserName = 'Joseph'
and get:
RELATION {
TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelatinshipName 'Friend'},
}
And be able to assing this kind of relational expressions to "relvars":
variable Relvar(SourceUserName:String , TargetUserName:String , RelationshipName:String) r = RELATION {
TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName 'Friend'},
}
And the be able to do things like:
variable Relvar(Name:String) users = RELATION {
TUPLE {Name 'Juan'},
TUPLE {Name 'Pedro'},
TUPLE {Name 'Alberto'},
TUPLE {Name 'Jose'},
TUPLE {Name 'Alfredo'},
TUPLE {Name 'Roberto'}
}
variable Relvar(SourceUserName:String , TargetUserName:String , RelationshipName:String) relationships
RELATION {
TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName 'Friend'},
TUPLE {SourceUserName 'Jane', TargetUserName 'Alfred', RelationshipName 'Enemy'}
}
And then do something like:
variable Relvar(SourceUserName:String , TargetUserName:String , RelationshipName:String) usersWithRelationships = users RENAME (Name AS SourceUserName) JOIN relationships
Given some of the proposals going around, here's how I imagine your example might go.
First, let's suppose we add anonymous record types such as Record<SourceUserName:String,TargetUserName:String,RelationshipName:String>
possibly with specialized syntax {SourceUserName:String,TargetUserName:String,RelationshipName:String}
, constructed with the syntax {SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"}
.
Then using the other proposals already in place we could do your example with:
value rel
= {for rec in {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"},
{SourceUserName="Jane", TargetUserName="Alfred", RelationshipName="Enemy"}}
if rec.SourceUserName == "Joseph"
rec};
and rel
would have inferred type {SourceUserName:String,TargetUserName:String,RelationshipName:String}[]
and essentially have value {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"}}
.
Does that seem satisfactory?
@RossTate
Lots of really interesting stuff that are currently really hard to do with SQL and its derivates (like JPAQL) or even its "antagonists" (such as UnQL) could get simplified with real support for relational operations in a language.
For example, in SQL-like languages if one has a "table" with a,b,c,d,e,f,g,h,i, fields and one only wants to rename "a" as "X" in SQL like languages one has to write this:
SELECT DISTINCT a as X,b,c,d,e,f,g,h,i FROM someTable
while with true relational support it would be possible to simply write:
someTable RENAME ( a AS x)
Even modern stuff such as C# LINQ forces you to have to write:
var query = from s in db.SomeTable
select new
{
X = u.a,
c = u.c,
d = u.d,
e = u.e,
f = u.f,
g = u.g,
h = u.h,
i = u.i,
};
The solution proposed by TheThirdManifesto is at a glance far more elegant and more expressive.
Another example, if we want to rename column "a" as "X", remove column "b" y and add a new column "z" with the value g+h we in SQL-like languages we would write:
SELECT DISTINCT a as X,c,d,e,f,g,h,i, g+h as z from someTable
While in a language with true relational support we could write:
someTable RENAME (a AS X) REMOVE (b) ADD (g+h z)
Which prevents us from rewriting those columns we do not wish to affect, and makes it explicit that we are removing the "b" column.
@RossTate
I really like your example with value rel = {for rec in...
that gives us something like relational selection... now... how can we add projection, column renames, joins, and so on?
The one thing I do not like about using the "for" keyword for this, is that it makes me think that the selection operation is going to happen sequentially when in fact this kind of operation could happen in parallel.
And finally if we could access the AST of this relational expressions and send them to JDBC for processing.... we would have transparent integration with relational databases... the only thing missing would be stored procedures and user defined functions (and even those could, if annotated to be compiled as such, using the AST, be translated to PL/SQL or TSQL or PgSQL and sent for storage inside the database....
Ah, those are good examples. Again, though, they more specifically have to do with anonymous records than with relational algebra in general. In particular one can add rec.-name
and rec.+name=value
to get the functionality you're looking for.
For example, {for row in someTable (row.+X=row.a).-a.-b.+z=row.g+row.h}
would implement your last example. The syntax is a bit wordier than your example, but that's because your example is extremely context sensitive with a lot of implicit capturing. For example, if there were some previously defined variable g
, then it's not clear whether the g+h
expression should reference the previously defined g
or the column g
. I don't want to go into this here, but choosing the latter (as your example implicitly does) makes the code fragile to external changes.
Unfortunately, these operations make polymorphism of records rather difficult, so we'd have to consider them carefully.
As for the for
keyword, I don't think we're worrying too much about parallelization in these early stages, but I don't think it would be too hard to address when its time comes up.
To summarize, it would be good to consider record types (as well as tuple types), especially due to its importance in databases. Nonetheless, we should move this discussion elsewhere (and elsewhen?). I'll leave @gavinking to choose when and where, since I've already made enough of a mess of these forums/issues.
Interesting solution this {for row in someTable (row.+X=row.a).-a.-b.+z=row.g+row.h)}
... I somewhat feel right now that operators make it more concise but might have a negative impact on legibility...
BTW, really enjoyed this discussion... but I agree, this is not the right when/where for this discussion... good night...
FTR, I'm now inclined to think that if we ever did get convinced that we need tuples—and I'm definitely still unconvinced of this—then a more convenient/consistent syntax would be to eliminate the parens, like this:
Float, Float polar(Float x, Float y) {
return sqrt(sqr(x)+sqr(y)), atan(y/x);
}
Float r, Float theta = polar(x, y);
instead of this:
(Float, Float) polar(Float x, Float y) {
return (sqrt(sqr(x)+sqr(y)), atan(y/x));
}
Float r, Float theta = polar(x, y);
The parens don't add any real value, they don't even help make the syntax less ambiguous.
You do need the parens for disambiguation, especially if we add forms of overloading such as either variatic generics or default type arguments. For example, if we have class C<X, Y=X> extends Map<X,Y>
, then is C<Float,Float>
a map from floats to floats or from pairs of floats to pairs of floats?
@RossTate No, you just need to have the interpretation that X,Y
in Map<X,Y>
actually is a pair, just like you already need to have the interpretation that x,y
in foo(x,y)
is a pair. Which is fine, because from a mathematical point of view a function with multiple parameters is just a function which accepts a tuple.
I think you misunderstood me; I just chose Map
because it has two type parameters. I meant that, with your syntax, it's not clear if C<Float,Float>
is C
fully instantiated or C
partially instantiated (i.e. C<(Float,Float)>
) and then the second type argument defaulting to the first, i.e. (Float,Float)
.
I understood you. What I'm saying is that it means the first, just like foo(1.0, 2.0)
would mean the first interpretation in the analogous situation.
There is no analogous situation. Our syntax for method calls is unambiguous. You are just arbitrarily saying the first interpretation is the correct interpretation. Suppose I wrote Iterable<Float, Float>
. Given the semantics of Iterable
, it's pretty clear that this is supposed to be an iterable of pairs of floats, yet given your arbitrary choice you'd get an iterable of floats whose next
method returns Float|Float
. In fact, many users of Iterable
probably won't even be very aware of the extra defaulted arguments, just like users of C++'s stdlib. So, I don't think Float, Float
should be the syntax for a type. If you mean that we can drop the parens in the special case of returning a tuple, then that's a different matter though.
@RossTate you're not listening to me.
Iterable<Float,Float>
would be a type error, just like print(1.0, 2.0)
would be. If you wanted to have an iterable object containing pairs, you would write Iterable<<Float,Float>>
using the <
and >
for grouping, just like you have to write print((1.0, 2.0))
to pass a pair to print()
. It's precisely analogous.
There is always an ambiguity between do the parens mean grouping or do they mean tuple instantiation. The best way to resolve that ambiguity, IMO, is to say that parens always mean grouping, and sticking commas between expressions always means tuple instantiation at the conceptual level (even when it's a function call). Likewise, at the type level, angle brackets always mean grouping, and commas are used to form tuple types.
This is totally conceptually clean and regular and grammatically unambiguous.
Now I am confused. You just said to get rid of the parens for tuple types. Now you're saying not to. Does this mean that you're saying we can drop the parens for tuple types only in the special case of a method's return type?
No. That's not what I mean.
value x = 1, 2;
means x
is a pair.
foo(1,2);
means pass the pair 1, 2
to the method foo(Integer, Integer)
.
Of course you would also be able to write:
foo(x);
To pass the pair x
to foo()
.
i.e. every method always has exactly one parameter, of type Tuple
.
First, to clarify, how do you denote the "type" of x, and is it a full-fledged type?
Also, are you saying "first, second", without parentheses, is how one constructs a pair of values in general?
First, to clarify, how do you denote the "type" of x
Integer, Integer
Which is an abbreviation of Tuple<Integer,Integer>
.
Is it a full-fledged type?
Yes, of course.
Also, are you saying "first, second", without parentheses, is how one constructs a pair of values in general?
Right. But of course this is at a highly conceptual level. We wouldn't really model all methods as methods with one parameter. That's just how we think about it in terms of designing the syntax.
i.e. Integer,Integer
is an abbreviation of Tuple<Integer,Integer>
just like 1,2
is an abbreviation of Tuple(1,2)
.
Okay, so from your example it sounds like you expect foo(1, 2);
and value x = 1, 2; foo(x);
to do the same thing. You also say that Integer
in Iterable<Float,Integer>
should override the default. Then consider the following definition of foo
:
Boolean foo(Object a, Object b=a) { return a === b; }
If we're consistent, then 2
in foo(1, 2)
should override the default. Thus foo(1, 2)
returns false
whereas foo(x)
returns true
, since it's shorthand for foo(x, x)
, which is contradictory to your expectations.
I now have a working prototype for tuple support, in the tuples
branch of ceylon-spec
and ceylon.language
. In this prototype, tuples are just syntax sugar for the perfectly ordinary types Unit
and Tuple
in ceylon.language
.
Therefore, I'm closing this issue, and moving discussion to #433 and #434.
I'm raising this issue here because it keeps coming up, and we need a permanent location to keep the discussion, instead of me having to re-explain my thinking on this issue every time. Nevertheless, this is an issue that we want to hear feedback on, even though I'm pretty sure I already know what the feedback will be.
Currently the #1 feature request for the language is support for tuples. I'm personally a skeptic, though I recognize that this is an argument I will likely lose. Let me explain why I'm a skeptic.
The thing is, almost every usecase people show for why tuples are sooo convenient, fall into one or more of the following possible categories:
Pair
), orWe don't need tuples to abstract over pairs. Indeed, we already have
Entry
, and the spec proposes a destructuring syntax for it. It would be very easy to slightly generalizeEntry
toPair
. Nor do we really need tuple support to represent coordinates in 3-dimensional space.And a whole class of usecases for tuples, example here, vanishes now we have union types.
Finally, when the usecase doesn't fit into one of these categories, it is usually an attempt to work around the nominal type system and get a poor mans structural typing. Personally, I think creating a language like Ceylon based on structural typing with aliases rather than nominal typing would be a really interesting exercise, but that's not this language.
On the other hand, arguing against myself, as usual, there is one usecase that I have run into:
This usecase, in and of itself, might be sufficient to justify tuples, I'm not sure.
Finally, it's worth noting that because Ceylon has sequenced type parameters, representing tuples within our type system is actually really quite straightforward. What bothers me is not tuples themselves, but all the extra syntax sugar you need to introduce to make them actually useful:
(name, age, address)
,(String, Natural, Address)
, andvalue name, age, address = .....
.Anyway, whatever we decide, I insist that this should be something we address after Ceylon 1.0.