ceylon / ceylon-spec

DEPRECATED
Apache License 2.0
108 stars 34 forks source link

Record Types #120

Open RossTate opened 12 years ago

RossTate commented 12 years ago

In the discussion for Tuples (#81), @luxspes brought up the issue of record types, a variant of tuples which are particularly important to databases and relational algebra.

One database given was:

RELATION {
        TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName  'Friend'},
        TUPLE {SourceUserName 'Jane', TargetUserName 'Alfred', RelationshipName  'Enemy'}
 } where SourceUserName = 'Joseph'

resulting in RELATION { TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName 'Friend'} }.

If we use the syntax {SourceUserName:String,TargetUserName:String,RelationshipName:String} to denote a record type in Ceylon, and the syntax {SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"} to denote a record instance in Ceylon, then we can encode that example using comprehensions (#82):

value rel
    = {for rec in {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"},
                   {SourceUserName="Jane", TargetUserName="Alfred", RelationshipName="Enemy"}}
       if rec.SourceUserName == "Joseph"
       rec};

and rel would have inferred type {SourceUserName:String,TargetUserName:String,RelationshipName:String}* and essentially have value {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"}}.

Another example was, rather than

SELECT DISTINCT a as X,c,d,e,f,g,h,i, g+h as z from someTable

where you have to list all the fields you're not even messing with, we should be able to do

someTable RENAME (a AS X) REMOVE (b) ADD (g+h z)

So, if we extend the syntax for records with

then we can do this in Ceylon with the following:

{for row in someTable (row.+X=row.a).-a.-b.+z=row.g+row.h}

The syntax is a bit wordier than the database example, but that's because the database example is extremely context sensitive with a lot of implicit capturing. For example, if there were some previously defined variable g, then it's not clear whether the g+h expression should reference the previously defined g or the column g.

These thoughts open up a number of issues:

  1. Do we want record types?
  2. How should subtyping for record types work?
  3. How should polymorphism (generics) for record types work?
  4. How do we handle database joins?
  5. How do we optimize comprehensions into database queries?
quintesse commented 12 years ago

Am I correct in understanding that {SourceUserName:String,TargetUserName:String,RelationshipName:String} is basically a shorthand for:

class TUPLE(String sourceUserName, String targetUserName, String relationshipName) {
    shared String sourceUserName = sourceUserName;
    shared String targetUserName = targetUserName;
    shared String relationshipName = relationshipName;
}

where TUPLE might actually be something explicit, anonymous or defined somehow by the typechecker, I haven't thought that through.

I'm not sure if I see much of an advantage, with the proposed changes in shared initializer parameters the above could already be shortened to:

class TUPLE(shared String sourceUserName, shared String targetUserName, shared String relationshipName) { }

and with the builder pattern intrinsic to Ceylon we could use this to make something like:

RELATION(
    TUPLE("foo", "bar", "baz"),
    TUPLE("wim", "zus", "jet")
)

and I'm sure with some clever thinking we could make a library that does what @luxspes wants, right?

RossTate commented 12 years ago

So, if I understand you correctly, no it can't be shortened to that. I forgot to mention, these are anonymous record types. We don't want to have to specify a new class each time, because we want to be able to modify the types. For example, the rec.+field=value construct is implicitly creating a new class which has all the same fields as the type of rec plus the field field. When joining two records, one makes a new type out of the union of their fields (as well as check for equal values in the intersection of their fields).

As for a library, the challenge is making a library that works for all anonymous record types. The only ones I know of use that can address features like .+ and .- involve type classes (such as the many proposals for Haskell, which I've been meaning to read in more detail) or type-level computation (such as Adam Chlipala's Ur), both of which I think we're currently avoiding. I'll explain why this is such a challenge eventually (although it's looking like it'll probably be after the new year, unfortunately).

We might have some some differences in our goals that we can take advantage of, but it'll take some investigating. Or, we might just have to come to some sort of compromise. That is, of course, if we even choose to have record types.

quintesse commented 12 years ago

Ok I understand, but that either means we statically determine those anonymous classes generating them wherever necessary, which would be nice but given that very often queries are actually dynamically built based upon user input I don't know if it's actually the way to go. We could allow dynamic building of new types or even just transform it to plain collections of values. But hen we're basically just putting the Hibernate query builder with some kind of in-memory database in the language ^^

-Tako

On Sat, Dec 24, 2011 at 12:07, Ross Tate < reply@reply.github.com

wrote:

So, if I understand you correctly, no it can't be shortened to that. I forgot to mention, these are anonymous record types. We don't want to have to specify a new class each time, because we want to be able to modify the types. For example, the rec.+field=value construct is implicitly creating a new class which has all the same fields as the type of rec plus the field field. When joining two records, one makes a new type out of the union of their fields (as well as check for equal values in the intersection of their fields).

As for a library, the challenge is making a library that works for all anonymous record types. The only ones I know of use that can address features like .+ and .- involve type classes (such as the many proposals for Haskell, which I've been meaning to read in more detail) or type-level computation (such as Adam Chlipala's Ur), both of which I think we're currently avoiding. I'll explain why this is such a challenge eventually (although it's looking like it'll probably be after the new year, unfortunately).

We might have some some differences in our goals that we can take advantage of, but it'll take some investigating. Or, we might just have to come to some sort of compromise. That is, of course, if we even choose to have record types.


Reply to this email directly or view it on GitHub: https://github.com/ceylon/ceylon-spec/issues/120#issuecomment-3267394

gavinking commented 12 years ago

@quintesse I think at the actual Java level a record is just a tuple (you add support for tuples and later layer support for records on top of it). The names would be something that exists at the typechecker level and eliminated during lowering.

quintesse commented 12 years ago

But can that handle the dynamic situation? Because normally with tuples you know at compile time with what "size" tuple you're dealing. Isn't that a possible problem?

-Tako

On Sat, Dec 24, 2011 at 18:51, Gavin King < reply@reply.github.com

wrote:

@quintesse I think at the actual Java level a record is just a tuple (you add support for tuples and later layer support for records on top of it). The names would be something that exists at the typechecker level and eliminated during lowering.


Reply to this email directly or view it on GitHub: https://github.com/ceylon/ceylon-spec/issues/120#issuecomment-3268484

gavinking commented 12 years ago

Because normally with tuples you know at compile time with what "size" tuple you're dealing.

Same is true of records. Really, they're just considered syntax sugar for a tuple in the languages I've looked at.

quintesse commented 12 years ago

Which then removes a large part of its usefulness IMO

RossTate commented 12 years ago

You raise good points. Interestingly, the implementation concerns suggest an approach for the type problem as well.

Implementation (for dynamic situations)

All record types erase to the same implementation, say an associative array (we could do a hashmap, but record types are typically small so an associative array would be faster). If an instance has type {foo:String,bar:Integer}, then that guarantees that associate array will have the key foo with a corresponding field of type String, and similarly for bar.

Type Approach (suggested by implementation)

Let {foo:String,bar:Integer} be a subtype of both {foo:String} and {bar:Integer}. So if you want to write something polymorphic over records with a foo field of type String, you can write <R extends {foo:String}> blah(R* records).

For rec.+field=val where rec has record type R and val has type T, the resulting type is R|{field:T}.

No removing fields (sorry); subtyping should take care of some of that though.

If r1 has record type R1 and r2 has record type R2, with R1|R2 satisfying Equatable<R1|R2>, then the result of intersect(r1, r2) has type (R1&R2)?. There is an issue if r1 and r2 have hidden fields of the same name, in which case join will work oddly, but I'm guessing this won't come up much for the applications we're considering.

gavinking commented 11 years ago

Related: #745.