Optimizing reified types

gavinking commented 11 years ago

There are three big problems to overcome in our implementation of fully-reified types:

performance: dragging around the reified types with instances of any parameterized type is potentially expensive—for now, we don't know quite how expensive, in practice
interoperation with Java: parameterized classes written in Java don't have reified type arguments
arrays and other value types: arrays are value types, so there is no way to propagate their type argument

I think we're going to need some way to address this via either optimized or partial reification of type arguments. Let me sketch out the options here:

Partial reification:

We could annotate a type parameter as static like this:

class C<static T>() {}

Then there would be limitations on what you can do with T. For example, you would not be able to use it with is, like this:

class C<out static T>(Object o) {
    if (is T o) {} //error, T is not a reified type
}

Nor would you be allowed to pass it as an argument to a reified type parameter:

class R<T>(T t) {}
class C<static T>(T t) {}
R(C("")); //error: argument C<String> to reified type parameter T of R is not a fully reified type

Of course, all type parameters of Java classes would be considered static.

The problem with this approach is that I think it would just get in the way much too much. For example, if tuple types were unreified, you would not be able to put instances of parameterized Java classes in them. It's hard to be able to guess in advance just how much pain this approach would cause, because we simply don't have enough experience with this stuff.

"Optimized" reification

In this approach, we let you provide an optimized implementation of the operation which obtains the reified type of an instance.

What we would probably do is say that for every subclass of Object which does not define an attribute named type, an attribute with the following signature is generated for it:

for a class C(), the attribute shared actual Class<C> type
for a class C<T>(), the attribute shared actual Class<C<T>> type
for a class C<U,V>(), the attribute shared actual Class<C<U,V>> type
etc

The implementation of the generated attribute would include generated fields holding the reified type arguments.

OTOH, a class would be free to define its own type attribute. For example:

shared class Tuple<out Element, out First, out Rest>(first, rest)
        extends Object()
        satisfies Sequence<Element>
        given First satisfies Element
        given Rest satisfies Sequential<Element> {

    shared actual Class<Tuple<Element,First,Rest>> type =>
            Tuple.producedClass(union(first.type, rest.type.arguments.first),
                            first.type, rest.type);
    ...

}

Thus, the instance is responsible for computing its own reified type, and is free to make use of its internal state to do that.

Unfortunately this solution doesn't address the issue of Java classes, which don't have getType() methods. I guess we would be left with some nasty runtime exception or some shit.

quintesse commented 11 years ago

Didn't @FroMage already have a pretty well thought-out design? Or do you have some issues with it?

gavinking commented 11 years ago

Eh? A well-thought-out design of what?

RossTate commented 11 years ago

First, static should be the default. Reification is a constraint, just like given. So have a reify annotation instead.

Second, I'm not really sure what your optimization is doing.

Third, I still haven't been given the various use cases y'all have in mind for reification, so it's hard to offer suggestions if I don't know why you need it.

FroMage commented 11 years ago

A question I was wondering about is symmetry of reified types and equality. Suppose the following:

class Foo<out T>(T... values){
 shared actual Boolean equals(Object other){
  if(is Foo<T> other){
   return values == other.values;
  }
  return false;
 }
}
class Top(){}
class Bottom() extends Top(){}

value b = Bottom();
value foo1 = Foo<Top>(b);
value foo2 = Foo<Bottom>(b);
assert(foo1 == foo2); // true, because foo2 of type Foo<Bottom> is also a Foo<Top>
assert(foo2 == foo1); // false, because foo1 of type Foo<Top> is not a Foo<Bottom>

Am I missing something here or is that going to be problematic?

gavinking commented 11 years ago

@FroMage Right. Note that my implementations of equals() for List, Map, Set carefully ignore the type arguments.

FroMage commented 11 years ago

OK.

gavinking commented 11 years ago

Further on the idea of "optimized" reification.

A critical issue we need to think through is whether there are soundness issues associated with letting a runtime type argument be a subtype the type argument you would get from purely static analysis. The cases I'm thinking of involve contravariance. Imagine we tried to optimize away T in the following code:

class Consumer<in T>(T[] seq) {}
Object[] words = ["hello", "world"];
Consumer<Object> = Consumer(words); //oops, unsound!

Here, if T is String instead of Object at runtime (since at runtime words is a String[] not just an Object[]), an instance of Consumer<String> would be assigned to Consumer<Object>, which is unsound. The question is whether all cases like this can be detected via local static analysis, and the optimization disabled. I suspect that the answer to this is "yes", since we would only enable the optimization for type parameters annotated out. (Yes, due to a recent change I'm now allowed to annotate a type parameter of a method in or out!) In this example, since Consumer<T> is contravariant in T the optimization would not occur. OTOH, in the following code, the optimization would be allowed, and would be sound:

class Producer<out T>(T[] seq) {}
Object[] words = ["hello", "world"];
Producer<Object> = Producer(words); //ok

@RossTate WDYT?

gavinking commented 11 years ago

By the way, the more I think on this issue, the less cases I can think of where reification is really likely to have a very negative impact on performance.

For types, you're only worried about classes where you have "as much" type arguments as state to hold onto. In the SDK, the only classes like that are Tuple and Entry, AFAIK. (We know we definitely need to optimize these two.) Any generic collection type generally has much more internal state than just one or two type arguments.
For functions the situation is a little more subtle. Many generic functions (e.g. map() or sort()) do something expensive like iterating a collection, or are subsequently used when iterating a collection (e.g. byIncreasing() or plus()), so it's hard to see how passing a type argument is likely to be very costly. But there are exceptions like curry() and compose().

Indeed the main case I'm concerned about (except for the abovementioned Tuple and Entry) is when a concrete non-generic type inherits a member of a generic type. For example, String inherits map(). We need to be sure that in cases like this we don't wind up reifying Element to Character every time map() is called. @FroMage what is your feeling on this issue? Is it a major problem?

Finally, how do reified type impact comprehensions? Anything to worry about there?

RossTate commented 11 years ago

Gavin, your example is not unsound. You're confusing static types and exact types. Reification does not mean these all line up. Reification just means you have a way of determining the type arguments used to allocate an instance of a generic class (and the derived type arguments for inherited classes and interfaces) and the type arguments used to invoke a generic method.

From what I understand, there are two main expenses for reification: the space needed per instance of a generic class to store the type arguments (only a problem for certain backends such as Java and JS but not C#) and the allocation done at run time to build the data structures representing type arguments (problematic for all backends).

Regarding your concerns with map, it should be using the reified information inside this to determine that type argument at run time.

The only concern I can see for comprehensions is that they use inference and we need to make sure inference doesn't affect the semantics; an issue we have mentioned elsewhere.

gavinking commented 11 years ago

Gavin, your example is not unsound.

Ross, it's unsound if I do the proposed optimization on it.

You're confusing static types and exact types.

Nope, it's more the optimization that "confuses" them ;-)

RossTate commented 11 years ago

Sorry, I misunderstood what you were saying and consequently what you were misunderstanding, heheh. Here's my new take.

Note that the following still type checks:

class Consumer<in T>(T[] seq) {}
String[] words = ["hello", "world"];
Consumer<Object> = Consumer(words);

Since Consumer is contravariant, Consumer(words) will choose the least precise type argument possible given the constraints. In this case the only constraint is String <: T, so it will infer Anything for T. This is good, cuz Consumer<Anything> is the principal type of Consumer(words), and in particular can be assigned to Consumer<Object> due to contravariance and Object <: Anything.

gavinking commented 11 years ago

@RossTate In the following code:

class Consumer<in T>(T[] t) {}
String[] strings = ["hello", "world"];
value consumer = Consumer(strings);  //we get a Consumer<String>

The inferred type of consumer is Consumer<String>, not Consumer<Object>. It's kind of a silly example, since contravariant types don't usually accept covariant types in their constructors. So to see that this is the correct behavior, consider this example:

class Consumer<in T>(T[] t) {}
DelegateConsumer<String> delegate = DelegateConsumer<String>();
value consumer = Consumer(delegate);  //we get a Consumer<String>

I'm only bringing this up to demonstrate that the type arg inference algorithm does the right thing here, because I do not want this thread to go off on a tangent about type arg inference and contravariant types. In M5 the algorithm does the right thing and I'm completely satisfied with it.

All that has nothing to do with what we're discussing here, which is a different example:

class Consumer<in T>(T[] t) {}
Object[] strings = ["hello", "world"]; //NOTE: static type Object[], runtime type String[]
value consumer = Consumer(strings);

Here, static analysis infers the type Consumer<Object>, but if we were to optimize away the passing of Object to T, and try to infer the type argument T at runtime, based on the runtime type of t, we would actually get a Consumer<String>, which is not a Consumer<Object>!

So obviously we wouldn't do this optimization in this case., because it would result in unsoundness. What we need to understand is in which cases is it ok to optimize away the static argument to a type parameter, and infer it at runtime from the runtime type of an argument?

For example, I believe it is sound to do it for tuples, that is, if I have the following:

Object x = "hello";
Object y = 1;
value tup = [x,y];

Then the static type of tup is [Object,Object], but its runtime type would be [String,Integer], if we make this optimization.

But I want to go further than that and do the same sort of optimization here, for example:

function prepend<out Head,out Tail>(Head head, Tail tail) 
        given Tail satisfies Object[] 
        => Tuple(head, tail);

I want to optimize away Head and Tail, and "infer" them at runtime from the runtimes types of head and tail. So if I do:

Object head = "hello";
[Object,Object] tail = [1, 1.0];
value result = prepend(head, tail);

I will get a result tuple with static type [Object,Object,Object] but runtime type [String,Integer,Float].

It's only OK to do this because the function prepend() is covariant in its parameters Head and Tail. We can't do this optimization for contravariant types of functions because it results in unsoundness as above. But is that the only case which would be unsound?

gavinking commented 11 years ago

After chatting with Ross, instead of talking past each other, as above, it's now clear what the cause of the unsoundness is.

Ross points out that we should not:

use a type parameter occurrence that appears covariantly in the parameter list to impose an upper bound on the inferred argument to a contravariant type parameter, nor
use a type parameter occurrence that appears contravariantly in the parameter list to impose a lower bound on the inferred argument to a covariant type parameter,

at least not at runtime, since that is the cause of the unsoundness above.

Arguably - indeed, Ross argues this, though I'm not completely sold - we should not even do that at compile time.

Anyway, this completely explains what's going on here, and gives us what we need to be able to solve the problem.

FroMage commented 11 years ago

We need to be sure that in cases like this we don't wind up reifying Element to Character every time map() is called. @FroMage what is your feeling on this issue? Is it a major problem?

We don't, because we optimise the type descriptor for Ceylon non-parameterised types into static final constants, in this case Character.$TypeDescriptor, but map takes another type param Result which may or may not be instantiated by the caller, depending on whether it's constant or not.

gavinking commented 11 years ago

@FroMage Right, of course.

gavinking commented 11 years ago

Let's delay full consideration of this issue until 1.1, which is when we're going to focus on performance.

ceylon / ceylon-spec

Optimizing reified types #485

Partial reification:

"Optimized" reification