eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
399 stars 62 forks source link

stack overflow for crazy case involving invariant type and intersections #3702

Open CeylonMigrationBot opened 11 years ago

CeylonMigrationBot commented 11 years ago

[@gavinking] Spotted by @RossTate, this crashes the typechecker:

interface Co<out T> {}
interface A satisfies Co<Inv<A&B&Co<Inv<A&B>>>> {}
interface B satisfies Co<Co<B&A>> {}
class Inv<T>() {}
Inv<A&B&Co<Inv<A>>> foo(Inv<A&B> bar) => bar;

[Migrated from ceylon/ceylon-spec#596]

CeylonMigrationBot commented 11 years ago

[@gavinking] Alternatively, could we replace rule 3 above with:

  1. In a parameterized type with a self type parameter and at least one other type parameter, all type parameters must be either covariant or contravariant.
CeylonMigrationBot commented 11 years ago

[@RossTate] So when canonicalizing, you don't need to worry about self types with this change. My guess is your algorithm is still doing something for A&B regarding Inv and it doesn't need it. A&B is canonical, and the only supertype of both A and B is Object.

On another note, I misread your example and thought of another problematic case. In particular, we need to say that self types can only inherit from other self types or from types without not using their parameters.

CeylonMigrationBot commented 11 years ago

[@gavinking] @RossTate

So when canonicalizing, you don't need to worry about self types with this change. My guess is your algorithm is still doing something for A&B regarding Inv and it doesn't need it.

Not true, we still need to form a principal instantiation of Inv to check that the members of Inv are correctly implemented.

A&B is canonical, and the only supertype of both A and B is Object.

No, what you're missing is that A&B might be canonicalizable to Nothing. It's actually disjoint type analysis that is blowing up.

CeylonMigrationBot commented 11 years ago

[@gavinking]

On another note, I misread your example and thought of another problematic case. In particular, we need to say that self types can only inherit from other self types or from types without not using their parameters.

I don't understand. What does "without not using their parameters" mean?

CeylonMigrationBot commented 11 years ago

[@RossTate]

Not true, we still need to form a principal instantiation of Inv to check that the members of Inv are correctly implemented.

You need A to have a principal instantiation of Inv, which it does, and similarly for B. Are you saying if someone were to extend both A and B then there would be problems? Wouldn't you then just check to see if they have the same instantiation for Inv then? At least the algorithms I have in mind don't have any termination issue here. I'm wondering if you're looking for a principal instantation of Inv when you don't need one, e.g. for the type A&B.

No, what you're missing is that A&B might be canonicalizable to Nothing. It's actually disjoint type analysis that is blowing up.

Admittedly I haven't considered disjoint type analysis. Isn't it's only purpose for types that arise intermediately from our type checking algorithms? If so, I think a sensible restriction is to not allow programmers to write disjoint intersections.

I don't understand. What does "without not using their parameters" mean?

Foo<P> of P extends Iterable<P> is bad cuz Iterable<P> uses parameter P, but Bar<P> of P extends Iterable<String> would be fine cuz Iterable<String> makes no use of P. Note that it's also a problem even if the superclass/interface uses a non-self type parameter.

CeylonMigrationBot commented 11 years ago

[@gavinking]

You need A to have a principal instantiation of Inv, which it does, and similarly for B. Are you saying if someone were to extend both A and B then there would be problems?

Sure. I'm saying that in general we need to canonize supertypes. Whether it's strictly necessary in this particular case is not the point, since in general I need to do it. Looks like you just constructed, by yourself, an example where it is necessary.

Wouldn't you then just check to see if they have the same instantiation for Inv then?

Define "same instantiation" in a way that doesn't require canonicalization. You can't. Especially not once you take into account multiple inheritance.

Admittedly I haven't considered disjoint type analysis. Isn't it's only purpose for types that arise intermediately from our type checking algorithms? If so, I think a sensible restriction is to not allow programmers to write disjoint intersections.

How am I supposed to even tell that a class implements a disjoint type when it inherits multiple instantiations of a generic supertype from different branches of its supertype hierarchy, without being allowed to canonicalize supertypes.

Seriously, you keep throwing around possible restrictions "oh, just don't let them do X", without taking into account that the typechecker, in order to be able to check the restriction X, needs to be able to canonicalize stuff.

CeylonMigrationBot commented 11 years ago

[@gavinking] Just a silly example. The typechecker needs to reject this code:

interface I<out T> {
    shared formal T t;
}
interface A satisfies I<String> {}
interface B satisfies I<Integer> {}
class C() satisfies A&B {
    shared actual String t = nothing;
}

The only way it can do that is by forming a principal instantiation of I for A&B, which requires canonicalization.

Similarly, it needs to reject this code:

interface I<out T> {
    shared formal T t;
}
interface A satisfies I<String> {}
interface B satisfies I<Integer> {}
class C() satisfies I<A&B> {
    shared actual I<String> t = nothing;
}

Again, canonicalization needed.

CeylonMigrationBot commented 11 years ago

[@RossTate] Okay, so first you canonicalize type aruments. Remember, when canonicalizing type arguments, you can ignore self types entirely, e.g. Inv. So, while canonicalizing, A's definition simply looks like A extends Object and similarly for B. After that, you go through written intersections and check to see whether they are disjoint, accepting that everything has been canonicalized. If one is disjoint, then you raise an error. After this point, you go through each class that extends Inv multiple ways and check to see if all those ways use the same type arguments, taking advantage of the fact that all type arguments have already been canonicalized.

You can optimize this process, but I'm just breaking it up into steps to show the intuition of staged guarantees.

Heh, just as I was about to post this, you added those examples, but both of them will be successfully rejected by the process I described.

CeylonMigrationBot commented 11 years ago

[@gavinking]

Remember, when canonicalizing type arguments, you can ignore self types entirely, e.g. Inv.

Sorry, I just don't buy that at all. You need a whole lot of extra argumentation to convince me of that.

CeylonMigrationBot commented 11 years ago

[@gavinking] Consider:

interface I<out T, out X> of T {
    shared formal X t;
}
interface A satisfies I<A,String> {}
interface B satisfies I<B,Integer> {}
class C() satisfies I<C,A&B> {
    shared actual I<String> t = nothing;
}

Sorry, but I can't just go ignoring I and its type arguments. No way.

CeylonMigrationBot commented 11 years ago

[@gavinking]

After this point, you go through each class that extends Inv multiple ways and check to see if all those ways use the same type arguments

This wave of the hand:

  1. buries a whole lot of difficult problems (define "the same"), and
  2. imposes a whole new restriction that our type system does not currently have.

We do not require all instantiations of a supertype to have the same type arguments. That is what Java requires, not Ceylon. It was your own suggestion.

CeylonMigrationBot commented 11 years ago

[@RossTate] Okay, so now I think I see the fundamental problem. Co<String>&Co<Integer> is equivalent to Co<Bottom> because it is equivalent to Co<String&Integer> and you have String&Integer equivalent to Bottom. I honestly don't know how to deal with this. I warned you that having disjoint intersections being equivalent to Bottom could cause problems, and now we've come across one. Now I see this basically screws up so many assumptions/properties that I expected to work, and without those properties I'm stuck.

CeylonMigrationBot commented 11 years ago

[@gavinking] Perhaps, but it's just such a practically useful thing. There's quite a few places where I check to see if the user has done something silly by seeing if an intersection is Nothing. It's a really expressive thing.

CeylonMigrationBot commented 11 years ago

[@gavinking]

On another note, I misread your example and thought of another problematic case. In particular, we need to say that self types can only inherit from other self types or from types without not using their parameters.

@RossTate Still trying to understand what you mean by this one, and what problem case you have in mind.

CeylonMigrationBot commented 11 years ago

[@RossTate] Wow, I just thought of a really interesting example.

interface Inv<P> {}
interface A extends Inv<Bottom> {}
interface B extends Inv<A&B> {}

The question is: are A and B disjoint?

If A is disjoint from B, then A&B is equivalent to Bottom, so you can extend both A and B, making them not disjoint.

If A and B are not disjoint, then A&B is different from Bottom, so no one can extend both A and B, making them disjoint.

In short, A and B are disjoint if and only if they are not disjoint. This does not bode well.

CeylonMigrationBot commented 11 years ago

[@gavinking] @RossTate Nice, I like that one :-)

Note that your B is already rejected by the rules 1-3 I proposed above, whether P in Inv<P> is a self type or not. But yeah, if I disable checking of rules 1-3 then it does blow up my disjoint type analysis algorithm.

Still, in light of this I wonder if we should disallow the use of Inv<Nothing> as a supertype. There's no usecase for that, not that I know of. There are uses for Co<Nothing> and Contra<Nothing> but I can't think of any reason you would want to extend Inv<Nothing>.

CeylonMigrationBot commented 11 years ago

[@gavinking] P.S. Is it the case that my rule 3 is needed as a consequence of disjoint type analysis?

CeylonMigrationBot commented 11 years ago

[@RossTate] Well, the issue that seems to come up is that, rather than canonicalizing by reducing types to something smaller/simpler, having disjoint=bottom forces you to canonicalize by expanding types to something bigger. But, in the face of recursive inheritance, one step of expansion may introduce a new intersection type that still needs to be expanded, and you can get this to repeat forever. The only strategy I can think of for preventing this is to require that all intersections occurring in inheritance clauses are already fully expanded. This suggests to me restrictions along the lines of #3686. In particular, this adaptation might be approximately right:

An intersection T1&...&Tn is valid in inheritance clauses if, for any generic C, there is some Ti with principal instantiations C<Ai> such that for any other Tj with principal instantiation C<Aj> we have that C<Ai> is a subtype of C<Aj>.

Putting aside the algorithmic issue of checking validity for the moment, does this seem plausible?

CeylonMigrationBot commented 11 years ago

[@gavinking] @RossTate What I don't especially like about this is that I don't see how it's particularly related to the one concrete actually problematic example we have:

interface Inv<O, out P> of O {}
interface A satisfies Inv<A,A&B> {}
interface B satisfies Inv<B,B> {}

What distinguishes this example is it exploits the loophole of having a "shape" with a type parameter that is not a self type. So surely any solution we come up with should be about limiting arguments to such type parameters. (Arguments to self type parameters are already highly limited by nature.)

If that doesn't work, well, fine, we just live with the restriction that a shape can only have one type parameter, it's self type parameter. But I would prefer to come up with something less crippled if possible.

Unless you have some other example you're thinking of, that you have not yet showed us?

CeylonMigrationBot commented 11 years ago

[@quintesse] @RossTate please also consider this example which in Gavin's opinion is just bad design but in our opinion is perfectly reasonable and people will want to do things like it:

shared class Array() 
    satisfies MutableList<String|Boolean|Integer|Float|Object|Array|NullInstance> { ... }

This is now disallowed while it seems a normal thing to (want to) do.

The problem in my opinion also is that a) the error does not give any help in figuring out why this is disallowed and b) that for people who are not experts in type systems it also gives no hint how to redesign this into something that is allowed.

CeylonMigrationBot commented 11 years ago

[@gavinking] I strongly object to us getting distracted by this silly example on this thread. Please lets stay on the topic of how can we make our algorithms terminate.

No way is this example going to impact on the design of our type system!

CeylonMigrationBot commented 11 years ago

[@FroMage] What, a set of real-world examples that do terminate, are not valid arguments against a heavy-handed algorithm for non-real-world fantasy examples that nobody has ever come up with so far? Man, sometimes I wish I had a clue-bat.

CeylonMigrationBot commented 11 years ago

[@quintesse] Sure, imagine getting distracted by an actual practical example ;) And we're not talking about the design of anything, we're talking about finding a algorithm for detecting unwanted/illegal/impossible types that doesn't crash. Right now we have one that disallows perfectly good types.

CeylonMigrationBot commented 11 years ago

[@gavinking] OK, great so you guys already hijacked the thread. Thanks.

CeylonMigrationBot commented 11 years ago

Any algorithm that rejects undecidable types is also going to reject some perfectly reasonable types. In this case you have a type that sorta looks recursive but is not recursive in any meanigful way, so can trivially easily be refactored to look nonrecursive. I already showed you how.

CeylonMigrationBot commented 11 years ago

[@quintesse] It doesn't matter that you showed me, like I said before you have to be able to explain it to our users (that are obviously never going to read any spec document) and even more difficult, teach them how to redesign in a way that is allowed. They'll ask "why do I have to do this? Why do I have to make my code more complex than seemingly necessary?" And your answer will have to be something simple, easy to understand and agree with. Not "because Gödel says so".

CeylonMigrationBot commented 11 years ago

[@FroMage] To be clear on my position. I think this non-termination is a non-problem. Nobody has ever written code in Java that crashes the compiler. Ever. Same in Ceylon, we've been writing crazy test code for years without noticing this non-termination issue. Sure, it is possible to write code which crashes the type-checker, but that is code that does not exist in the real world.

On the other hand, unless we come up with a better non-termination check than we have ATM (which may be possible), we are going to reject perfectly working and valid code which, as opposed to the non-terminating one we're trying to detect, does exist. You had to rewrite some of your code yourself. We're talking about rejecting real existing useful user code to be able to reject code that non human has ever produced and likely never will?

IMO it's a dick contest. You and Ross are trying to solve a non-issue, that has never bothered anyone. If the solution you guys come up with rejects valid useful code that does exist, then your solution is worse than the unreal problem you guys are trying to solve. I would much rather have a crashing compiler on non-existent unreal code than one which rejects code left and right for no good reason other than we couldn't find a better non-termination detection algo. The world never had a problem with non-termination with Java, it will likely never have one with Ceylon.

If you guys can come up with a better algo which would not reject so many false-positives, I wouldn't mind, but I really think you guys are wasting a lot of time on something that is not useful.

CeylonMigrationBot commented 11 years ago

[@FroMage] Having said that, I have always had this intuition that there are pragmatic ways to detect infinite recursion, and my really crude attempt at doing so appears to be enough to detect the problematic cases from the test suite: 0e7870b9e5ae266e58a38ccadf6f7197adfdfc8e

I am by no means claiming to have solved this particular issue, as it's way of my depth. But I do have the intuition that it should be possible to detect infinite loops by maintaining a stack of things we're trying to do and not try to do them a second time if we already have one similar (equal would be better) check already going on because it's guaranteed to not terminate. In my POC I use the type's qualified string representations for comparison, but we should probably find something more sound, like a hash code or whatever.

Perhaps I'm reading the results wrong, but it does appear to not trigger any errors in the test suite, and detect the non-termination cases. Do you think this is in any case a reasonable direction to explore for detecting non-termination in these very specific cases (obviously this is not a general solution, but we don't have a general problem).

CeylonMigrationBot commented 11 years ago

[@gavinking] Let's revisit this for 1.1.

CeylonMigrationBot commented 11 years ago

[@RossTate] Ok, I finally managed to sneak some time to get back to this. @FroMage and @quintesse, you're right to raise the concern of sacrificing practical types for impractical theoretical concerns. That said, you're example is horrible; you have a union with Object, so the rest of the things in the union are completely ignored; in other words, it's equivalent to Array() satisfies MutableList<Object>. I highly suspect that code using your example would be broken. Nonetheless, I'll fix it.

You also raise the point that no one accidentally writes programs that break the Java compiler. This is true, but the what I want to know is why is this the case. After all, people do write programs that break the Scala compiler. This shapes idea is a guess as to this reason; my suspicion is that people generally only use recursive inheritance for self types and type families and then never use these as type arguments. To test this hypothesis, we analyzed 20 million lines of Java code and didn't find any violations of this hypothesis. So, my response to you is that the reason why no one writes programs that break the Java compiler is the same reason that this shapes idea is practical.

That said, I have thought of one practical example this disallows. In particular, you might think to do Tree<T> extends List<Tree<T>>. It's a clever way to encode trees, which is essentially what your example is trying to do. This didn't seem to show up in practice, though. My suspicion is that the reason is that one can simply accomplish this same convenience almost as convenient by just having a method List<Tree<T>> nodes(). Furthermore, this enables Tree<T> to extend Iterable<T>, which is also rather useful.

There are other issues that people raised, which I'll get to later. I wanted to address the above concerns first, since they subsume everything else. I should also mention that this restriction also enables some cool features, but I'll get to those later.

CeylonMigrationBot commented 11 years ago

[@quintesse] Hi @RossTate , just one remark, the example was lacking information for the casual reader so to speak. IIRC the example was taken from some code implementing a JSON API, the Object mentioned is not Ceylon's nor Java's Object but a JSON object (which is not a super type of the other types).

CeylonMigrationBot commented 11 years ago

[@RossTate] Ah, that makes sense. What do you think about my observation about factoring out a nodes method to deal with this one case?

jvasileff commented 7 years ago

For this particular one:

interface Inv<P> {}
interface A satisfies Inv<A|Inv<A>> {}

attempting to simplify the union A|Inv<A> in Inv<A|Inv<A>> involves checking if Inv<A> is a supertype of A, to see if the latter can be omitted. To do so, we find the Inv<> supertype of A, which is Inv<A|Inv<A>>, and then immediately attempt to canonicalize this type (which looks just like the one we started with). Hence the overflow.

Adjusting Type.isSubtypeOfInternal to use a hacked resolveAliasesNoDedup() which uses a hacked addToUnionNoDedup() which simply omits the code that tries to remove redundant types makes the sample code compile. It does cause two typechecker tests involving dynamic to fail, not sure why.

jvasileff commented 7 years ago

To share some thoughts:

Most of the tricky intersection cases seem to come down to this: needing to find out if two types are disjoint in order to find out if the two types are disjoint, with at least one of the types being a class or interface.

There are two subcases: (a) when we don't actually need to know if the types are disjoint, and (b) when we do.

Considering (a) first, with the simplified example:

interface Inv<P> {}
interface A satisfies Inv<A> {}
interface B satisfies Inv<A&B> {}

In order to find out what B satisfies, we need to know if A and B are disjoint.

  1. A and B both satisfy Inv<>. They are disjoint if they satisfy Inv<> in incompatible ways.
  2. Therefore, we need to know if A == A&B. If not, then B actually satisfies Inv<Nothing>
  3. We can determine that A != A&B regardless whether or not A and B are disjoint: A&B is either Nothing or A&B, and neither of these is exactly A. We don't actually need to know if A and B are disjoint to discover the answer to (2). They are indeed disjoint.

Presumably, hasEmptyIntersectionOfInvariantInstantiations() could be coded to avoid the extra (unnecessary) work simplifying A&B, thereby eliminating the infinite recursion.

OTOH, it seems such declarations should be disallowed anyway. If the user really wants satisfies Inv<Nothing> (or even satisfies Cov<Nothing> in more complex cases), they should just write it that way.

Now, considering case (b), which is @RossTate's very clever:

interface Inv<P> {}
interface A satisfies Inv<Nothing> {}
interface B satisfies Inv<A&B> {}

we actually do need to find out if A and B are disjoint in order to find out if A and B are disjoint, because their disjointedness is relevant to the definition of the type.

Repeating the steps:

  1. Same: A and B are disjoint if they satisfy Inv<> in incompatible ways.
  2. Same: We need to know if Nothing == A&B. If not, then B really satisfies Inv<Nothing>.
  3. This time, simplifying A&B matters. If A and B are disjoint, then Nothing == Nothing. If not, then Nothing != A&B.

So there's no short-circuit opportunity in (b.3), and we will overflow even with code changes to solve case (a). This is good.

To detect (b), I suppose we could provisionally consider A and B non-disjoint in (b.3), but then ultimately discover that they are disjoint since Nothing isn't exactly the same as a non-disjoint A&B. This would of course have to result in an error.