eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
399 stars 62 forks source link

stack overflow for crazy case involving invariant type and intersections #3702

Open CeylonMigrationBot opened 11 years ago

CeylonMigrationBot commented 11 years ago

[@gavinking] Spotted by @RossTate, this crashes the typechecker:

interface Co<out T> {}
interface A satisfies Co<Inv<A&B&Co<Inv<A&B>>>> {}
interface B satisfies Co<Co<B&A>> {}
class Inv<T>() {}
Inv<A&B&Co<Inv<A>>> foo(Inv<A&B> bar) => bar;

[Migrated from ceylon/ceylon-spec#596]

CeylonMigrationBot commented 11 years ago

[@gavinking] In #3686, @RossTate also mentions this simpler one:

interface Inv<P> {}
interface A satisfies Inv<A&Inv<A>> {}
CeylonMigrationBot commented 11 years ago

[@gavinking] So I have now got the typechecker rejecting both these cases by disallowing stuff like this:

T satisfies A<B<T>>

while still allowing this:

T satisfies A<T>

i.e. a type can appear as a type argument one-level-deep, but not two-levels-deep.

This might be heavy handed, and it might be insufficient, so the questions are:

  1. can @RossTate think up a way to easily circumvent this new rule,
  2. will we even need it once we have a conclusion for #3686, and
  3. can anyone think of an example where T satisfies A<B<T>> is actually useful?
CeylonMigrationBot commented 11 years ago

[@gavinking] Rm, I can currently circumvent it like this:

interface In<P> {}
alias Y => In<X>;
interface X satisfies In<X&Y> {}

So I need to handle aliases a bit better, apparently.

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, I have an ugly and even heavier-handed way to prevent circumvention via an alias.

CeylonMigrationBot commented 11 years ago

[@gavinking] Hrm, so I can still circumvent it as simply as this:

interface In<P> {}
interface Y satisfies In<X> {}
interface X satisfies In<X&Y> {}
CeylonMigrationBot commented 11 years ago

[@gavinking] So my new solution to this problem is to disallow:

I know that sounds heavy-handed, but it's something I can actually determine without needing to run the subtyping algorithm or introduce a special new compilation phase. And so far I have not noticed anything useful that these rules preclude.

CeylonMigrationBot commented 11 years ago

[@gavinking] In fact, I think I might be able to remove the first restriction if I enforce the second restriction in AliasVisitor instead of TypeVisitor. I forgot that we already have a separate phase for dealing with aliases and circularity.

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, first restriction is gone, leaving the following as disallowed:

  • an intersection as a type argument to an invariant type parameter in a satisfied/extended type
CeylonMigrationBot commented 11 years ago

[@RossTate] This still seems to crash things. Note it doesn't have anything invariant anywhere.

interface Co<out P> {}
interface A satisfies Co<B&Co<A>> {}
interface B satisfies Co<A&Co<B>> {}
Co<A&B&Co<A&B>> foo(Co<A&B> co) {return co;}

I'm guessing what it's crashing is your canonicalization algorithm. I still don't believe you need a canonicalization algorithm.

CeylonMigrationBot commented 11 years ago

[@gavinking]

This still seems to crash things. Note it doesn't have anything invariant anywhere.

Damn, that sucks :-(

I'm guessing what it's crashing is your canonicalization algorithm.

Yes. So there's two ways I guess I could eliminate this case:

  1. Completely disallow intersections as type arguments in extends/satisfies clauses. This seems like it would prevent some useful type definitions, but it doesn't appear to be anything we use in ceylon.language or the SDK.
  2. Try to detect and disallow circular type definitions when the circularity occurs via type arguments. I don't have a good understanding of the pros/cons of this approach or even of how viable it is.

I still don't believe you need a canonicalization algorithm.

I can't imagine how else I would deal with complex types involving union and intersection. Note that the compiler now leaves types uncanonicalized until it's time to determine if one is a subtype of another. Then at that point it goes ahead and resolves aliases and canonicalizes to a union of intersections.

CeylonMigrationBot commented 11 years ago

[@RossTate] Here's another example that's currently broken. Rather than crashing, you reject foo even though it's valid code. This suggests your canonicalization algorithm is incomplete even when there are no intersections in inheritance clauses at all.

interface A satisfies Co<Co<A>> {}
interface B satisfies Co<Co<B>> {}
A&Co<B&Co<A>> sub(A&Co<B> sub) {return sub;}
A&Co<B> sup(A&Co<B&Co<A>> sup) {return sup;}
Inv<A&Co<B&Co<A>>> foo(Inv<A&Co<B>> foo) {return foo;}
CeylonMigrationBot commented 11 years ago

[@RossTate] As for your circularity proposition, I'm currently doing some research on that. There are definitely practical uses of circularity, so you can't just disallow it, but I'm looking into options. Those options might address the concerns here, but it's a big change enough change that this problem alone doesn't call for it even if it works out.

CeylonMigrationBot commented 11 years ago

[@gavinking] What is the definition of Co? Same as above?

CeylonMigrationBot commented 11 years ago

[@RossTate] Yep. Co for covariant and Inv for invariant. Sorry, I got lazy.

CeylonMigrationBot commented 11 years ago

[@gavinking]

you reject foo even though it's valid code.

I don't follow: why is A&Co<B&Co<A>> equivalent to A&Co<B>? What am I missing?

CeylonMigrationBot commented 11 years ago

[@RossTate] sub and sup type check, illustrating that the two are equivalent. What you're missing is that A is equivalent to A&Co<Co<A>>, so A&Co<B> is equivalent to A&Co<Co<A>>&Co<B>, which is equivalent to A&Co<Co<A>&B> and consequently A&Co<B&Co<A>>.

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, got it. So ProducedType.isExactly() is broken. Indeed, it seems to be broken in a rather nasty way where I might need a whole new algorithm. Apparently, my canonicalizations are less unique than I thought.

CeylonMigrationBot commented 11 years ago

[@gavinking] I guess I should just replace x.isExactly(y) with x.isSubtypeOf(y)&&y.isSubtypeOf(x). That removes some code, at least!

CeylonMigrationBot commented 11 years ago

[@gavinking] Well, doing that got me a stack overflow, which I'll have to try and figure out...

CeylonMigrationBot commented 11 years ago

[@RossTate] Depending on the system, using double subtyping versus some specialized canonicalization algorithm can introduce non-termination. Remember that contravariance is the source of problems for non-termination in subtyping because it makes the types being compared flip sides, which double subtyping will do as well.

CeylonMigrationBot commented 11 years ago

[@gavinking] So even after eliminating the test that caused the stack overflow, the tests ran like 10x slower with this change. So definitely not the right path. So what's a correct implementation of isExactly()?

CeylonMigrationBot commented 11 years ago

[@gavinking] Is there some way we can recognize that A&Co<B&Co<A>> can be canonicalized to A&Co<B>?

CeylonMigrationBot commented 11 years ago

[@gavinking] I suppose we could simply recognize that A is an invariant subtype of Co<Co<A>, and then do B&Co<A>~Co<A> to arrive at A&Co<B>. Sounds conceptually possible.

CeylonMigrationBot commented 11 years ago

[@RossTate] I really don't know. And remember, this is an "easy" case; it doesn't use intersections or invariance in inheritance, nor union types anywhere. That's why I'd much rather just not allow the type at all. It'd help me if I better understood why you don't want to restrict types. Is it for implementation reasons? In which case, what goes wrong? Or is it for design principles? Or is it because there's some use-case you think would want to use these types?

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, so clarifying something. The issue here is that under the principal instantiation inheritance axiom, the following identities hold:

Co<X>&Co<Y> == Co<X&Y>
Contra<X>&Contra<Y> == Contra<X|Y>

These aren't otherwise valid, and canonicalization doesn't take them into account.

CeylonMigrationBot commented 11 years ago

[@RossTate] Right. So if A extends Co<A> and B extends Co<B>, then Co<A&B> is redundant in the intersection A&B&Co<A&B> even though it is not a supertype of either A or B.

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, that particular problem is now solved, and we're back to the question of nontermination.

CeylonMigrationBot commented 11 years ago

[@gavinking] P.S. FTR, the fix for (this)(#3702#issuecomment-21301325) was quite simple in the end. A little adjustment to isExactly(), and not related to the main topic of this issue.

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, great, so I now have a restriction that detects all the problem cases given so far. Basically, I noticed that they all have one thing in common:

So that's what I'm now disallowing. The nice thing is that it's not overly-heavy-handed.

CeylonMigrationBot commented 11 years ago

[@gavinking] In fact, the problem does not seem to be limited to intersections. I can also get nontermination for unions, for example:

interface Inv<P> {}
interface A satisfies Inv<A|Inv<A>> {}

So I've extended the restriction to also cover unions.

CeylonMigrationBot commented 11 years ago

[@gavinking] Unfortunately, extending the restriction to union types prevents the following slightly-nice pattern:

interface A satisfies Comparable<C|A> {}
interface C satisfies Comparable<C|A> {}

However, I'm certain we can live without this.

CeylonMigrationBot commented 11 years ago

[@RossTate] I don't understand what your fix is. Can you explain how it solves or prevents this example:

interface A satisfies Co<Co<A>> {}
interface B satisfies Co<Co<B>> {}
A&Co<B&Co<A>> sub(A&Co<B> sub) {return sub;}
A&Co<B> sup(A&Co<B&Co<A>> sup) {return sup;}
Inv<A&Co<B&Co<A>>> foo(Inv<A&Co<B>> foo) {return foo;}

P.S. Good find with A extends Inv<A|Inv<A>>!

CeylonMigrationBot commented 11 years ago

[@gavinking] @RossTate The issue is that you've mixed together two issues in this one issue. I fixed the bug in isExactly() that was rejecting that code, and I have added the following test, which is accepted by the typechecker:

    interface Co<out P> {}
    interface Contra<in P> {}    
    interface Inv<P> {}

    interface A satisfies Co<Co<A>> {}
    interface B satisfies Co<Co<B>> {}
    A&Co<B&Co<A>> sub1(A&Co<B> sub) => sub;
    A&Co<B> sup1(A&Co<B&Co<A>> sup) => sup;
    Inv<A&Co<B&Co<A>>> foo1(Inv<A&Co<B>> foo) => foo;

    interface X satisfies Contra<X> {}
    interface Y satisfies Contra<Y> {}
    X&Contra<Y|X> sub2(X&Contra<Y> sub) => sub;
    X&Contra<Y> sup2(X&Contra<Y|X> sup) => sup;
    Inv<X&Contra<Y|X>> foo2(Inv<X&Contra<Y>> foo) => foo;

But the main thrust of this discussion has been about decidability. And I now speculate that I also have that problem licked! Of course, without a formal proof, it's a bit silly for me to go making grand claims like that, but I feel good about the new set of restrictions.

CeylonMigrationBot commented 11 years ago

[@gavinking] As far as decidability goes, the basic idea is to not allow any kind of type that in any way depends upon the type being defined inside a union or intersection in any of the types it inherits.

CeylonMigrationBot commented 11 years ago

[@gavinking] i.e. interface A satisfies Inv<A&Inv<A>> {} is rejected because A occurs inside an intersection. Likewise, interface A satisfies Co<B&Co<A>> {} is rejected because A occurs as a type argument inside an intersection. Even this:

interface A satisfies Inv<B|Inv<B>> {}
interface B satisfies A {}

must be rejected, because the type B, which inherits A occurs inside a union in a supertype of A.

All the undecidable examples we've seen so far are of that form.

CeylonMigrationBot commented 11 years ago

[@RossTate] Okay, so your restriction is to disallow types defined in terms of the class being defined from being used in intersections and unions in the inheritance clause. That may work, though it's hard to say.

How did you use that to get foo1 and foo2 to work, though?

CeylonMigrationBot commented 11 years ago

[@gavinking]

Okay, so your restriction is to disallow types defined in terms of the class being defined from being used in intersections and unions in the inheritance clause. That may work, though it's hard to say.

Wellyes, we can't know w/o a formal proof or a counter example. But I feel good about this overall approach.

How did you use that to get foo1 and foo2 to work, though?

Well, I just directly use the fact that if I have types of form A&T<X> and A&T<Y> and if both types are invariant subtypes of T<Z> for some Z then the types are equivalent. It's slightly more involved than that but not much.

CeylonMigrationBot commented 11 years ago

[@gavinking] I'm closing this issue, since the family of bugs encompassed by the initial report seems to be fixed. Further discussion belongs in #3686.

CeylonMigrationBot commented 11 years ago

[@FroMage] This now disallows the following:

interface Bug866_Visitable<Visited> of Visited satisfies Iterable<Integer|Visited> 
    given Visited satisfies Bug866_Visitable<Visited> {
}

Which I wrote during some demo/hands-on. Seems a pity.

CeylonMigrationBot commented 11 years ago

[@FroMage] Also disallows:

abstract class Real() satisfies Numeric<Real>&Scalable<Int|Real,Real> {}
CeylonMigrationBot commented 11 years ago

[@RossTate] I don't understand your solution. What is an invariant subtype?

As for @FroMage's concerns, I do believe your restriction might be overkill.

CeylonMigrationBot commented 11 years ago

[@gavinking]

I don't understand your solution. What is an invariant subtype?

Spec section 3.3. I believe you use the language "the principal instantiation of T that is a supertype of ...".

As for @FroMage's concerns, I do believe your restriction might be overkill.

Quite likely, but nobody else has proposed a set of restrictions that eliminate the problematic cases and can actually be easily enforced by the compiler.

CeylonMigrationBot commented 11 years ago

[@gavinking]

Also disallows:

abstract class Real() satisfies Numeric<Real>&Scalable<Int|Real,Real> {}

I might be able to loosen the restriction to let this one slip through. It should be possible to recognize that intersections like Int|Real are safe since Int and Real are both classes and don't involve type parameters.

CeylonMigrationBot commented 11 years ago

[@gavinking] Indeed, all the examples we've seen involve something of form:

A&C<B>

or:

A|C<B>

In satisfies or extends.

It seems likely that unions and intersections of form A&B or A|B (no type arguments anywhere) can't result in nontermination, since the canonicalization algorithm can quickly determine that any such type is already in canonical form.

If that's right, there would be no reason to reject either of @FroMage's examples.

CeylonMigrationBot commented 11 years ago

[@gavinking] Ah, excellent, I just managed to construct a new nonterminating example, which suggests I understand problem a lot better now:

interface S<out P> {}
interface T<out P> satisfies S<P> {}
class C() satisfies S<C>&T<S<C>> {}

Since the compiler needs to construct a principal instantiation of S for C, it's as if we had written:

class C() satisfies S<C&S<C>> {}

which is a case we already know is problematic.

So the question now is, how can I recognize cases like the above as problematic, while still allowing stuff like this:

class C() satisfies Numeric<C>&Comparable<C> {}

The problem seems to arise when the "recursive" type (in this case C) occurs at different depths on either side of the intersection/union. But of course I can hide the depth mismatch like so:

interface S<out P> {}
interface T<out P> satisfies S<S<P>> {}
class C() satisfies S<C>&T<C> {}

So what's the right way to distinguish S<C>&T<C> from Numeric<C>&Comparable<C>? Well, the difference is simply that Numeric and Comparable have no common parameterized supertype, so the need to form a principal instantiation of the supertype never arises.

So I think maybe we wind up with the rule that:

Within the intersection of all direct supertypes of a type T, we cannot have any intersection or union of form A&B or A|B where both A and B inherit a common parameterized supertype X, and both A and B somehow involve T.

Where "X somehow involves Y" means X is a type that inherits either Y or a type that somehow involves Y or X is a type expression with a type argument that somehow involves Y.

CeylonMigrationBot commented 11 years ago

[@gavinking]

It seems likely that unions and intersections of form A&B or A|B (no type arguments anywhere) can't result in nontermination

This is nonsense on stilts, please ignore. Counterexample:

interface S<out P> {}
interface A satisfies S<S<C>> {}
interface B satisfies S<C> {}
class C() satisfies A&B {}

That is, C is sufficient to crash the typechecker, even given its totally harmless-looking satisfies clause, since A and B inherit a common supertype S, and are both defined in terms of C.

CeylonMigrationBot commented 11 years ago

[@gavinking] A cleaner and less heavy-handed statement of the rule might be something like this:

Within the intersection of all direct supertypes of a type T, we cannot have any intersection or union of form A&B or A|B where A is a subtype of T<AA> and B is a subtype of T<BB> for some common parameterized type X, and both AA and BB somehow involve T.

The problem with this formulation is that to determine every AA such that "A is a subtype of T<AA>" without being able to directly form the principal instantiation of T for A is potentially difficult.

CeylonMigrationBot commented 11 years ago

[@gavinking] No, these formulations are broken, since they reject stuff like this:

 satisfies Scalar<Integer> & 
                  Integral<Integer> &
                  Binary<Integer> & 
                  Exponentiable<Integer,Integer>

Dammit!

CeylonMigrationBot commented 11 years ago

[@gavinking] I have essentially given up on trying to develop formal rules for this one. I have not been able to discover a set of rules that disallows all the known problematic cases without also disallowing stuff that is needed in the language module. Instead, I'm just directly detecting the undecidability by attempting to canonicalize the intersection of all supertypes of the type, and detecting the case of nontermination. Nasty solution, but at this point it's all we have.

CeylonMigrationBot commented 11 years ago

[@gavinking] Last night @RossTate proposed a potential solution that involves distinguishing types with self types from other parameterized types. In an attempt to translate his idea into something that makes sense in Ceylon's type system, I wound up with the following:

  1. In a supertype expression, a direct/indirect circular reference to the type being defined can't occur, except in an argument to a type parameter of a type with a self type.
  2. In a supertype expression, a type with a self type cannot occur anywhere in a type argument.

Note that this is not precisely @RossTate's way of expressing the idea, so any holes in it are mine, not Ross'.

The good news about this is that it accepts all the types we currently define in the language module, after minor changes, and rejects all the previously known undecidable types. The bad news is that I was able to construct an undecidable example, by taking advantage of a little intentionally-engineered hole in it:

interface Inv<O, out P> of O {}
interface A satisfies Inv<A,A&B> {}
interface B satisfies Inv<B,B> {}

The hole is the additional parameter, which is not a self type. Now, I can certainly close this hole easily enough, by reformulating the first rule:

  1. In a supertype expression, a direct/indirect circular reference to the type being defined can't occur, except in an argument to a self type parameter.

(This was indeed my initial formulation.)

Unfortunately, this formulation rejects a couple of types in the language module:

So the question now is, instead of tightening 1, is there a way to tighten 2 in order to eliminate the undecidable cases. There are two interesting features of the example above. In order to produce nontermination, I needed to:

  1. make at least one type parameter of Inv invariant, and
  2. use an intersection as a type argument to Inv.

I have not yet found a way to produce nontermination for a type with a self type where all type parameters are invariant or covariant, or where no intersection occurs in an argument to a parameter of a type with a self type. So I hypothesize that the following ruleset results in termination:

  1. In a supertype expression, a direct/indirect circular reference to the type being defined can't occur, except in an argument to a type parameter of a type with a self type.
  2. In a supertype expression, a type with a self type cannot occur anywhere in a type argument.
  3. In a supertype expression, an intersection may not occur anywhere in an argument to a type parameter of a type with a self type.

Now, I'm pretty much pulling rule 3 out of my ass here. The question is: is there a reasonable conceptual foundation for it. Do the above rules really eliminate all the nonterminating cases?