Instantiation: Initalizer Parameters, Factory Functions and Cloning

CeylonMigrationBot commented 12 years ago

[@ikasiuk] This issue is the result of some comments and suggestions in #3366, #3420, #3162 and others. There are some interesting ideas how language elements like the new parameter syntax and the named argument syntax can be used or extended. I think its worth exploring how they could be put to use together. And in particular I'd like to explore the possibility of factory functions which could solve certain common problems with scopes and visibility (see recent comments in #3162) and are closely related to cloning (see #3366).

Initializer Parameters

We have a new syntax for initializer parameters where parameters correspond to attributes and class A(String s) {} is basically equivalent to class A(s) { String s; }. Recently I've been warming to the idea of introducing an own annotation for such parameter attributes: class A(s) { parameter String s; }. This has several reasons:

Such attributes really are different from normal attributes because of their close connection to the associated parameter, see #3420. So it seems like a good idea to indicate that difference with an annotation.
Readablity of the source code is increased if parameter attributes look different from normal atributes.
It would open the possibility to add a restriction for parameter attributes, like: members annotated "parameter" must be declared before all other members. This would be a very sensible rule IMO.
This annotation is an important step towards factory functions because it allows you to define parameters that do not appear in the parameter list and can thus only be used with named argument syntax. That's especially interesting if the parameter is not shared:

class A(arg1) {
    shared parameter String arg1;
    parameter String arg2 = "default";
    shared A test() {
        return A { arg1 = "1"; arg2 = "2"; }
    }
}

In this example, only code where A.arg2 is visible would be able to specify a custom value for that parameter.

Ad-hoc Overriding Using Names Argument Syntax

This was proposed in #3420. The idea is that you can override formal or default members while creating an object with named argument syntax:

abstract class A(String arg) {
    shared formal String str;
}
A a = A { arg = "arg"; str = "str"; }

So we can even instantiate abstract classes or interfaces if we provide all the missing members. This is a very useful feature which fits excellently into the language.

Factory Functions

A problem with Ceylon's class initializer syntax is that you can only specify one constructor signature and that it always has the same visibility as the class itself. That's a problem if the way the implementation needs to create instances of the class is different from the external interface. These kinds of problem are mentioned in #3366 and #3162.

A good and generic solution is to provide a possibility to specify alternative ways of creating instances. That's the purpose of factory functions. By declaring different factory functions with different visibility you can control who can create instances in what way.

Basically, all we need to allow the definition of factory functions is a special new annotation (probably plus the parameter annotation mentioned above):

class A(String arg1) {
    parameter String arg2 = "default";
    shared formal String str;
}
new A f(String x) {
    return A { arg1="1"; arg2=x; }
}

The rules for the new annotation are:

A function anotated new can be used just like a normal constructor, even in an extendsclause.
Such a function may only return a new value of the exact type which is specified as the return type, i.e. either an object created by another factory function or by the original constructor.
The function may not access the created object (or at least no overridable members) before it is returned.

So the factory functions can then be used in the following ways:

A a = f { x="x"; str="str"; }
class B(String x) extends f(x) {}
new A f2() { return f("x"); }

In some cases, as mentioned by Gavin in #3162, you also want to completely prevent some user code (for instance code outside the same package) from directly creating instances, but without changing the visibility of the class itself. This can be achieved relatively easily with factory functions if we introduce a sealed annotation or add an optional paramter to the abstract annotation:

shared sealed class A() { ... }
// or e.g.: shared abstract(false) class A() { ... }
new A f() { return A { ... }; }

Interpretation: The class initializer can only be called by code that has "private" access to the class, i.e. that can see its non-shared members. Currently that only applies to code inside the class itself, but that's almost certainly going to change, see #56.

Cloning

The problems of cloning have been discussed in #260. In many cases, creating a copy of an object can be implemented by the means presented above. But this can get impractical if the class has a lot of members whose values just have to be copied directly, as that would mean a lot of boilerplate code. For these cases it would be nice to have a mechanism which automatically creates a copy of an object.

This problem is closely related to factory functions because a cloning function as basically a "magic" factory function which receives an original object as input and instantiates a copy:

shared new Clone cloneObject<Clone>(Clone objectToClone) { /*magic*/ }

If we provide such a function in the language module then we can write code like this:

class A(String arg1) satisfies Cloneable<A> {
    default String str = "";
    shared actual A clone() {
        return cloneObject<A> { objectToClone=this; str="str"; }
    }
}

Note that we can only redefine default members of the cloned object, not the values of parameters or other non-default members. I think that's a very good thing: the new object is created without executing the initialization code. If we allowed the redefinition of arbitrary members then it would be all too easy to inadvertently create an inconsistent state. Remember that we need this feature mainly for classes where cloning basically means simply copying all member values. For more complicated cases where the state of the new object has to be manipulated afterwards we should rather use the mechanisms shown in the sections above.

[Migrated from ceylon/ceylon-spec#319]

CeylonMigrationBot commented 12 years ago

[@gavinking] @ikasiuk Note that I still have not read your above proposal, but I have just come up with a different approach that covers some of the same ground.

Goals

My proposal addresses two related concerns:

There is no good way to "hide" the initializer of a class from clients outside the package/module.
There is no way to define multiple initializers for a class.

At present the "solution" to this problem is to write the class as an interface, with each "initializer" as a separate class that implements the interface. This is OK most of the time, but is not always completely ergonomic. At worst, it can look like a workaround.

Proposal

The proposed solution follows the basic pattern of this "workaround", but removes the requirement to turn the class into an interface.

Defining a class with no initializer

First, we'll let you define a class that looks a lot more like an interface:

it has no parameters,
its body contains no initialization logic, and
its superclass is specified in the satisfies clause instead of the extends clause.

For example:

shared class Path satisfies Object {
    shared String[] pathElements;
}

Defining initializers for the class

Now, we'll let you write the "constructors" for the class. There's two syntactic options that might make sense, and I have not yet put the time into deciding which is better.

Option 1: using a method-like syntax

The first option is to make a "constructor" look like an object-builder method:

shared Path newPath(String[] elements) 
        extends Object() {
    pathElements=elements;
}

Path copyPath(Path path) 
        extends Object() {
    pathElements=path.pathElements;
}

Note that this option only really makes sense in conjunction with #3420.

Option 1: using a class-like syntax

The second option is to make a "constructor" look like a class definition:

shared class NewPath(String[] elements) 
        extends Object() 
        satisfies Path {
    pathElements=elements;
}

class NewPath(Path path) 
        extends Object()
        satisfies Path {
    pathElements=path.pathElements;
}

Problems

There are a large number of open questions here:

Which of the two options is better/more natural?
If we go with the first option, what precisely is the relationship between this syntax and the object builder syntax?
If we go with the second option, would it be better to use a different keyword ... for example, new instead of class?
What is the relationship, if any, between a class with no initializer and the notion of an abstract class? Do we even really need abstract classes if we have this new thing?
What exactly does this map to at the Java (and JavaScript) level? If we try to map it to a single Java class with multiple constructors, we would have to place some special limitation like that the initializers must be defined in the same unit as class declaration. But that might not be the best mapping to Java.

CeylonMigrationBot commented 12 years ago

[@gavinking] After reviewing the original post by @ikasiuk, I think the two approaches are really quite in parallel. Ivo's proposal to allow members annotated parameter seems to be pretty much addressing the same concern as my idea to simply eliminate the parameter list. They are different approaches, but with the same goal of providing an API that is to be used by external initializers. Ivo's "factory functions", and my " initializers" are essentially the same thing, except for syntax.

(I have not yet put any thought into how cloning relates to this stuff.)

CeylonMigrationBot commented 12 years ago

[@gavinking] Actually, I now realize that my "option 2" really boils down to:

allowing an interface to satisfy a concrete class,
requiring any class that satisfies such an interface to extend the concrete class, and
restricting such interfaces to a single inheritance model. (That is, you can't mix more than one such interface into a class or interface.)

This is an idea I've toyed with before. If it would really solve the problems we're interested in here, then that would be fantastic, since it is a pretty "minimal" change to our language.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] > This is an idea I've toyed with before. If it would really solve the problems we're interested in here, then that would be fantastic, since it is a pretty "minimal" change to our language.

Really nice idea, I like the simplicity. Just a small but significant change: for every interface, allow a default implementing class with the same name as the interface:

interface A { /* members */ }
class A() satisfies A { /* initialization */ } 
class OtherA() satisfies A { /* alternative initialization */ }

And for every class which does not have such an interface, implicitly assume one. So class A() {} implicitly means

interface A {} 
class A() satisfies A {}

Now, for every class X we can write

interface Y satisfies X {} 
class Y() extends X() satisfies Y {}

without having to allow a class name in the satisfies clause! Of course the short form of the above is class Y() extends X() {}.

This change avoids the problem that people would likely end up always writing

interface A { ... }
class NewA() satisfies A { ... }

instead of class A() {...} in case they have to add an alternative initializer class later.

CeylonMigrationBot commented 12 years ago

[@gavinking] @ikasiuk yeah, that's probably a very reasonable approach. You need an extra restriction though: something like "if a class A has the same name as an interface A, then the class A may not declare any shared members."

In light of that, it may still be worth introducing a different keyword here.

interface A {}
new A() satisfies A {}

Then a class declaration would be considered a shortcut way to write an interface+new pair.

CeylonMigrationBot commented 12 years ago

[@gavinking] > You need an extra restriction though: something like "if a class A has the same name as an interface A, then the class A may not declare any shared members."

Of perhaps that's just no big deal. We already have anonymous types like true and null. I don't see why the type of the class A can't be another kind of anonymous type. If you had:

interface A {}
class A() satisfies A { 
    shared String hello="hello"; 
}

Then this would compile:

print(A().hello);

But this would not:

A a = A();
print(a.hello); //error

Just like what happens with members declared by objects.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Yes, I guess the class is no problem. The interface is special though: for this separation of member implementation and initialization the interface and the implementing classes have to share implementation details. That's only possible if the interface can have non-shared members, and in particular formal non-shared members:

interface Handler {
    formal List<Event> eventList;
    // ...methods that use eventList...
}
class Handler() satisfies Handler {
    actual List<Event> eventList = ArrayList<Event>();
}

The direct consequence is that the interface can only be implemented by classes which can access its non-shared members. And the consequence of that is that these interfaces are effectively restricted to single inheritance in most cases. That's also the case in your original proposal but not strictly required with my suggested change because interfaces don't inherit from classes there.

I think this kind of special interface is reasonable, but we might want to introduce an annotation or so to indicate the difference to normal interfaces.

CeylonMigrationBot commented 12 years ago

[@gavinking] > The interface is special though: for this separation of member implementation and initialization the interface and the implementing classes have to share implementation details. That's only possible if the interface can have non-shared members, and in particular formal non-shared members:

To be honest, I don't really see that this as a hard requirement. I've never felt the need to introduce this for the interfaces we have today, and I don't see why these interfaces would be any different.

I'm not saying that I don't think it would be very useful to have a way of limiting the visibility of a member of a shared interface to the current package. This is something we have already identified as desirable. I just don't see that it is really especially related to the thing we're discussing here.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Hm, I guess you are right that these interfaces are not fundamentally different. The question of visibility of the members also applies to other interfaces. It's interesting that this can lead to interfaces where some code can see the interface but can't implement it (same for classes of course).

To be honest, I don't really see that this as a hard requirement. I've never felt the need to introduce this for the interfaces we have today, and I don't see why these interfaces would be any different.

This is a point that we apparently perceive differently. With the proposed feature we separate two closely related parts of a type's implementation from each other, namely the implementation of the members and the initialization of the object. I would expect a close relation between the two elements (interface and class) as a result of that, and a requirement for sharing implementation-specific information which should remain hidden from clients. In other words: the difference to other interfaces and classes is that we want to shift the majority of the implementation from the class into the interface and leave only the initialization code in the class. After all, the goal is to specify multiple initializers for the same type.

Well, I guess I'll have to play with some examples to see how it works out.

CeylonMigrationBot commented 12 years ago

[@gavinking] > With the proposed feature we separate two closely related parts of a type's implementation from each other, namely the implementation of the members and the initialization of the object. I would expect a close relation between the two elements (interface and class) as a result of that, and a requirement for sharing implementation-specific information which should remain hidden from clients.

But I really think that this is just the same issue of wanting to have a shared toplevel type with members that are package-private that has already come up in other contexts.

CeylonMigrationBot commented 12 years ago

[@gavinking] > And for every class which does not have such an interface, implicitly assume one.

@ikasiuk I guess you actually meant this more literally than what I originally interpreted. i.e. not just as a metaphor but as the actual semantics.

FTR, I think there is a significant problem with that: it's pretty hard to naturally "extract" an inferred interface from a class definition. You would have to reason that every simple attribute and every other member that (transitively) captures any simple attribute is actually formal on the interface. I really would not like to try and figure out all the logic surrounding this.

So I think we want to stick to the idea that interfaces and classes are separate beasts and that what we're proposing here is to let an interface satisfy a class type. This is actually more like a kind of upper bound constraint (or even a kind of self type constraint) on the interface that says that all implementations of the interface must also be subclasses of the class. It's not saying that the interface is inheriting the state of the class. So it doesn't break our notion that interfaces are stateless. Note that a class can already appear in the satisfies clause today - in an upper bound constraint on a type parameter.

Nevertheless, I don't see any strong reason to reject your suggestion that a class could have the same name as an interface. It seems like a useful convenience, and I can't see how it would break anything.

P.S. ignore my statement about restricting interfaces that satisfy classes to a single inheritance model. That's probably wrong.

CeylonMigrationBot commented 12 years ago

[@gavinking] > Do we even really need abstract classes if we have this new thing?

The interface-upper-bounded-by-a-class construct really does compete with the abstract class construct. They're not the same, and they each have certain advantages for certain circumstances, but it's quite difficult to motivate having both in a single language.

Now, truthfully, I've long harbored a secret desire to eliminate abstract classes. Unfortunately, that would have a really pretty deep impact on our type system. Since enumerated types would always be interfaces, types like Void, Nothing and Boolean would now be interfaces. And so the root of the class hierarchy (the class called Object() that would not extend any other class) would not be the root of the type hierarchy (the interface Void). That feels pretty strange to me, though perhaps I would get used to it.

Just something to think about...

CeylonMigrationBot commented 12 years ago

[@gavinking] > And so the root of the class hierarchy (the class called Object() that would not extend any other class) would not be the root of the type hierarchy (the interface Void).

Actually it's worse than that. Since null is not an Object, there would be no single root to the class hierarchy.

So, I guess it's pretty hard to imagine us getting rid of abstract classes.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] > But I really think that this is just the same issue of wanting to have a shared toplevel type with members that are package-private that has already come up in other contexts.

Yes, of course. Although I'd perhaps prefer unit-private in this case. All I'm saying is that to me this is a prime example of a case where that requirement arises. But I don't think we have a serious disagreement here, just slightly different viewpoints.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] One thing is odd about this way of using the concept of interfaces and classes: the names don't quite fit anymore. It's strange to call something interface that contains practically the complete implementation of a type.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] If an interface can satisfy a class that can also solve a different problem:

interface I {
    shared actual default Boolean equals(Object other) {...}
    shared actual default Integer hash {...}
}
class C() satisfies I {}

That doesn't work because C inherits two different implementations of equals and hash, namely from I and from IdentifiableObject. And think I can't tell the compiler to use the one from I.

That problem could be solved by letting I satisfy IdentifiableObject so that the implementation in I overrides the one in IdentifiableObject.

gavinking commented 8 years ago

This problem was already solved in #3902 and released in Ceylon 1.2.

eclipse-archived / ceylon