ceylon / ceylon-spec

DEPRECATED
Apache License 2.0
108 stars 34 forks source link

Typesafe circular references #1156

Open LasseBlaauwbroek opened 9 years ago

LasseBlaauwbroek commented 9 years ago

First, my apologies for any erroneous syntax of Ceylon I might use. I would like to make a proposal to enable the typesafe creation of circular references. I have "borrowed" the annotation late for this, but this is probably not the best name for this annotation.

The idea is to annotate a value with late if it has not yet been initialized, but we still want to pass the value that it will contain at a later point around to functions. We accomplish this by requiring that the late annotation is propagated to everywhere it is used. Note that this will only work with values and not variables. The following rules apply:

A late annotation to a parameter, value, function or object a inside closure c (a function, method or class/constructor) has the following consequences:

  1. a may not be evaluated in the class initializer (in case c is a class)
  2. a may not be evaluated in the body of the function (in case c is a function)
  3. a may not be assigned to a value that is not defined in the "current" closure c
  4. if a is assigned to a value b, it must also be annotated late
  5. if closure c contains a closure (function or class) d that evaluates a, it must be annotated late
  6. if c is a class, constructs in the declaration section of c do not have to be annotated late if they evaluate a
  7. if c is a function or class/constructor with one or more late parameters, invoking c with no late arguments results in a non-late return value; invoking c with one or more late arguments results in a late return value (only assignable to a late value)

We regard this, super, outer, etc., to be implicitely anotated late.

Examples:

The circular dependency example from the tour, rewritten using this proposal:

class Child(late Parent parent) {
    shared String message = "message";
}

class Parent() {
    late Child lateChild = Child(this);
    print(lateChild.message); // Error, violates (1)

    shared Child child => lateChild; // Allowed, according to (7)
}
value parent = Parent();

Note that this feature would probably work well together with #1149.

An example with functions (completely useless, but you get the idea hopefully):

Anything() fun(late Anything() x) {
    x(); // Error
    late Anything() funX() => x; // Allowed
    return funX; // Allowed
}

What we want to create using this function, are two functions that return eachother. This is a problem, because we do not have a declaration section we can use as in the previous example. I have two possible solutions to this:

The first is a new construct, remniscent of try ... catch

late (Anything() a, Anything() b) {
    a = fun(b);
    b = fun(a);
    a(); b(); // Error
} then {
    a(); b(); a()()();// All allowed
}

The idea here is that we create two values a and b, that are late in the late-scope, and non-late in the then-scope. The compiler checks that all values either depend on non-late values, or on each-other.

This can probably also be done implicitly without the new control structure, at the cost of some of the understandability:

late Anything() a;
late Anything() b;
a = fun(b); // Compiler records that a has a dependency on b
a(); // Error
b = fun(a); // Compiler records that b has a dependency on a;
// At this point the compiler can verify that all dependencies are circular, and magically removes the `late` annotations
a(); b(); a()()();// All allowed

Another example using classes with the initialization logic in a factory function:

class A(late B b) {
    shared String message = "class A";
    String otherMessage => b.message;
}

class B(late A a) {
    shared String message = "class B";
    String otherMessage => a.message;
}

[A, B] create() {
    late (A a, B b) {
        a = A(b);
        b = B(a);
    } then {
        print(a.otherMessage); print(b.otherMessage);
        return [a, b];
    }
}

The syntax proposed here is far from perfect, but I believe it should work (I'm sure i've overlooked a lot of cornercases, but I think they should be fixable). As far as I can see, this will even work with multitreaded programs. To implement this, the first thing that comes to mind is using a container Box<T> for a late value of type T. This container initially contains a null value, and can be passed around. When the actual value is computed, it will be set in the container, and can be used. This is, however, probably not the most efficient method.

gavinking commented 9 years ago

So this is a pretty interesting proposal. I don't yet have a good intuition as to what extent it is a replacement for the existing late annotation, but certainly it seems to solve some of the most common cases, and it solves them in a much more typesafe way. Certainly it can't do everything that late, can do, given that late is a rather blunt instrument, but that's perhaps a good thing. I must admit that this proposal does some things that I had concluded weren't reasonably doable.

So, @LasseBlaauwbroek, you've got me with you right up to the point where you start talking about factory functions and late/then. It seems to me that this goes too far and I don't quite see the strong motivation for it. Perhaps you had some practical usecases in mind that I'm not seeing?

P.S. This is a different feature to late, and it would need a different annotation.

gavinking commented 9 years ago

P.S. @LasseBlaauwbroek Thanks for this :-)

LasseBlaauwbroek commented 9 years ago

I don't yet have a good intuition as to what extent it is a replacement for the existing late annotation

Well, as far as I know, the only reason for late to exist is to create circular references. This proposal should be able to create any type of (indirect) circular reference you could think of. So, in principle, the current late annotation could be deprecated after this is implemented.

About late/then: The point is that the current system with an initializer section and declaration section in classes already is late (this, otherVariables) {...} then {...} in disguise (the initializer acts as late and the declaration section as then). This means that if we don't have some sort of late/then construct (or the implicit one), we can only use late variables inside classes. My second example is certainly not very compelling, but I think the third one is.

What happends there, is that class A and B are completely decoupled. I can call B(...) with a subtype of A and vice versa. This is not the case in the first example. I think this enables some flexibility (in the first example, to use a subclass of Child, a constructor function would need to be supplied to Parent).

Having said that, late/then would not be strictly neccesarry for this feature. In the first iteration, it could be omitted, and if required, added later. This would simplify the implementation significantly (only requiring a new annotation and some assignment and evaluation restrictions in the compiler).

P.S. This is a different feature to late, and it would need a different annotation.

I fully agree, but have not yet thought of a suitable name. Perhaps future?

LasseBlaauwbroek commented 9 years ago

Fixed some stupid mistakes in the above, I'm a bit tired...

gavinking commented 9 years ago

This means that if we don't have some sort of late/then construct (or the implicit one), we can only use late variables inside classes.

Ah actually I think I sorta know what you're getting at now. It's something a bit like the semantics of Scheme's letrec (or Ceylon's toplevel declarations), i.e. something like:

return let (future A a = A(b), future B b=B(a)) [a,b];

Currently we don't support annotations like variable and late on a let-bound variable, but in future we certainly could.

Very interesting.

LasseBlaauwbroek commented 9 years ago

Right, that might be a construct that is a lot clearer than mine and I think with the same semantic meaning (nice that let was just introduced :-) ).

LasseBlaauwbroek commented 9 years ago

I suppose the downside of let is that it can only be used in expressions...

And I would suggest leaving off future when using let, because when the variables are actually used, they are not future. So, something like this:

return let (A a = A(b), B b=B(a)) [a,b];
RossTate commented 9 years ago

So far I have reason to believe that this is sound if done carefully (e.g. preventing late references from leaking and addressing issues that might arise with nested late constructs).

gavinking commented 9 years ago

After the discussion in #1167, I think I should invest time in this proposal.

gavinking commented 9 years ago

@LasseBlaauwbroek I don't think your rules quite capture the restrictions upon assigning a to a parameter. Here's how I think they should be written.

A reference or value parameter of a class, constructor, or function may be annotated late. (We wouldn't initially support late functions, getters, or setters, though the rules could certainly be extended to accommodate them.)

Finally, this and outer are treated like a late reference within the initializer and constructors of a class.

WDYT?

NOTES:

gavinking commented 9 years ago

P.S. I still think that late is not the right annotation name for this.

gavinking commented 9 years ago

I think I like the name initial for the annotation. So:

class Child(shared initial Parent parent) {
    shared String message => parent.string;
}

class Parent() {
    initial Child c = Child(this);
    print(c.message); //error!
    shared Child child => c;
}

value parent = Parent();
gavinking commented 9 years ago

So, a thought that just occurred to me is this: as clever as @LasseBlaauwbroek's proposal is, it doesn't actually address one very important usecase we have for late as it exists today. That is, for interop with Java frameworks like Hibernate/JPA, CDI, Spring, etc, which do injection into fields after instantiation of the object. So, for example, if I need to be able to write:

entity class Person {
    shared late id String name;
    shared new Person(String name) {
        this.name = name;
    }
    new New() {} //leaves name uninitialized
}
LasseBlaauwbroek commented 9 years ago

In reaction to your set of rules: I must say that I'm having some trouble to follow them precisely. First, why do you say foo is a parameter. I believe you should be talking about any value that is annotated late. Parameters are just a subset of the posiblities. Especially because in later rules you talk about late values that are not necesseraly parameters. I'm also curious to know why you initially don't want to support late function references. I think this makes everythink less regular, and the rules more complex.

Rule 1) is correct I think, altough I would rephrase it to 'may not be evaluated' in order to accomodate (future) function values.

2 foo may not be returned by a function or getter.

From this rule I infer that a getter is defined in the initializer section? In that case, wouldn't it be an idea if it is also possible to define getters in the declaration section. This way you can use foo without annotating the getter late. I anticipate that getters would need to make frequent use of late parameters, so it should either be possible to annotate them late or declare them in the declaration section.

In rule 3), It is again not clear to me why you make a disctinction between parameter values and reference values? I also do not quite follow the 'or a reference declared within the body of the class, constructor, or function to which foo belongs'. I think we have the same interpretation, but to be clear, the reference cannot be declared in a closure that lies "deeper" than the current class/function, but can be declared in a closure that exists within the current class/function.

I think that rule 4) is the equivalent of my rule 7), and seems to be correct, except for 'i.e. as the subject of a member reference'. Unless I misunderstand, you just disallowed dereferencing a member of foo in rule one.

In the end I think your rules are more or less equivalent to mine. I'm not sure why you think my rules lacked in specification for parameters.

Note that apart from these rules, there must also be rules to make sure that late values are in the end actually assigned to. This includes detecting circular late values and declaring complete cycles as 'assigned to'.

I think I like the name initial for the annotation.

I still kind of like future as mooted before. It nicely captures that the value cannot be used yet, but is guaranteed to be usable in the future.

it doesn't actually address one very important usecase

Such extra-lingual use-cases can indeed not be captured in this proposal. Any fix for this kind of interop will always remain a "hack". I'm not intimately familiar with these kind of injections and how they manifest. I would, however, like to suggest a method that might be slightly more efficient/clean:

entity class Person {
    shared id String name;
    shared new Person(String name) {
        this.name = name;
    }
    new late New() {} //leaves name uninitialized
}

The late annotation on the constructor hints to the compiler that some magic initialization of name will be happening. This way the compiler only has to insert null-checks for name after the invocation of the constructor. After that, name can be always considered null-safe.

gavinking commented 9 years ago

@LasseBlaauwbroek

I believe you should be talking about any value that is annotated late. Parameters are just a subset of the posiblities.

I say foo is "a reference or value parameter". Yeah, sure, a value parameter is just a kind of reference. See §4.8.1 and §4.3.4.

I would rephrase it to 'may not be evaluated' in order to accomodate (future) function values.

Well, no, that would be wrong, we can evaluate foo itself—we do so every time we assign it to another late reference or function parameter. It's foo's members that can't be evaluated.

From this rule I infer that a getter is defined in the initializer section?

The rules are explicitly stated to only apply in initializers and constructors. They do not apply in declaration sections.

the reference cannot be declared in a closure that lies "deeper" than the current class/function, but can be declared in a closure that exists within the current class/function.

I don't think that restriction is necessary. I think it can be safely used in a a "deeper" closure, due to the other restrictions that apply in an initializer section.

gavinking commented 9 years ago

Note that apart from these rules, there must also be rules to make sure that late values are in the end actually assigned to.

Those are just the normal rules that apply to any value declaration in Ceylon. No need for any special language.

gavinking commented 9 years ago

I still kind of like future as mooted before.

"Future" is already a well-known concept. I don't think we can hijack that term.

gavinking commented 9 years ago

The late annotation on the constructor hints to the compiler that some magic initialization of name will be happening.

Similar to what I already proposed on #1167.

LasseBlaauwbroek commented 9 years ago

I don't think that restriction is necessary. I think it can be safely used in a a "deeper" closure, due to the other restrictions that apply in an initializer section.

As far as I can see quickly, if you 'leak' a late value to a "deeper" closure, you run the risk of getting problems with multithreaded programs. But given that a late value cannot also be variable, it seems to actually be impossible to assign to a "deeper" late value, because they will have already been definitely initialized (at least if the deeper closure has already returned). But I could be missing something here.

"Future" is already a well-known concept

Right. That's a bad idea. But I must say that I don't immediately get any intuition with your initial. What is your reasoning on this? I will keep thinking for better names...

LasseBlaauwbroek commented 9 years ago

Some possible alternative names for of late:

I'm not particulary entousiastic about any of them.

RossTate commented 9 years ago

Alright, I think I've managed to phrase this in terms of coinduction, which gives me more confidence it is sound.

First, late T is not a type, or more precisely it doesn't have kind Type. That means you can't use it as a type argument. This may mess up your metasystem.

Second, if a constructor annotates a field with late, then it must be annotated late to indicate that its return type is late. Note that if a class has multiple constructors, in theory they can differ on which fields (if any) are annotated with late as well as on being late.

Third, the only things that can be done with late values is be assigned to late variables, returned when the method's return type is late, checked for reference equality and identity hashed, and type cast (or whatever other operations only need the instance's address and its metainformation). This could be implemented in the compiler by changing those operations' input type.

Lastly, this is late in the initialization block. The late annotations on all fields are ignored by all post-initialization methods.

If don't mind limiting expressiveness a bit, you can say that a method's return type is always late if any of its inputs are late, and then not have return types annotated with late. If you want to increase expressiveness a bit, you can say that a method's returned value is not late if none of its inputs are late. These two combined turn into the idea that an expression is late iff any of the variables in the expression are late.

There are some subtleties that come up with outer, but I wanted to avoid cluttering the ideas right now.