CarliJoy / intersection_examples

Python Typing Intersection examples
MIT License
33 stars 2 forks source link

Multiple inheritance and intersections. #33

Closed randolf-scholz closed 9 months ago

randolf-scholz commented 11 months ago

We need to discuss how intersections relates to, and integrates with multiple inheritance. Points to discuss:

  1. Should class C(A, B) always be considered a subtype of A & B?
    1. Naive multiple inheritance creates order dependent MRO, but intersections are order independent. Usually, extra work needs to be done to ensure LSP-compatibility of C with both A and B.
    2. issubclass always considers C a subclass of both A and B, since it is a nominal subtype of both. This leads to a conflict with (i) if we also want isinstance(x, A&B)isinstance(x, A) and isinstance(x, B), since C might fail to be a structural subtype of A&B.
    3. Correctly implementing C(A,B) as a subtype of A&B generally requires overloads involving intersections. Since these are currently missing, there is lots of formally incorrectly annotated code. This has lead to type-checkers allowing certain false negative LSP violations: https://github.com/python/mypy/issues/15790. How big is the problem? Eric tested against the mypy-primer and found over 2.3k errors.
  2. Should it be possible to subclass intersections? class C(A & B)
    1. Checks for existence of diamond problem at class definition time and raises TypeError in this case.
    2. This guarantees order-independence of the MRO.

Regarding ①, I still think considering C(A, B) always as a subtype of A&B is the reasonable choice, type-checkers can raise errors inside the class body of C at places where it violates structural subtyping. However, what I worry is that type-checkers will shy away from implementing https://github.com/python/mypy/issues/15790, since tons and tons of legacy code will be incorrectly typed. This is where C(A&B) could come into play, with 2 benefits: Type-checkers can, without raising tons of errors in existing code bases, perform stricter checking inside such classes, and secondly, it can provide a second layer of security by testing for the diamond problem at class creation time.

mark-todd commented 11 months ago

Copied from other issue:

For the mro method of a class to be backwards compatible there has to be a defined order:

>>> class A:
...     pass
... 
>>> A.mro()
[<class '__main__.A'>, <class 'object'>]
>>> class B:
...     pass
... 
>>> class C(A,B):
...     pass
... 
>>> C.mro()
[<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>]

The mro method must return a list of classes in their resolved inheritance order - if A & B are commutative this is no longer resolvable in this structure - it would have to be something like:

>>> class C(A & B):
...     pass
... 
>>> C.mro()
[<class '__main__.C'>, set(<class '__main__.A'>, <class '__main__.B'>), <class 'object'>]
randolf-scholz commented 11 months ago

Idea: Allow subclassing A & B.

At runtime, A & B should create an instance of IntersectionType, just like A|B creates an instance of UnionType. It should be possible to implement IntersectionType in a way that allows us to subclass it. The advantage is that extra checks can be added that raise a TypeError if the diamond problem is present:

class A:
    def foo(self): ...

class B:
    def bar(self): ...

class C:
    def foo(self): ...

class A_and_B(A & B): # ✅ no issue
    ...

class A_and_C(A & C):  # ❌ errors at class definition time, diamond problem!
    ...

class A_and_C_compat(A & C):  # ✅ Ok, since diamond problem resolved.
   def foo(self): ...

These checks should guarantee that the MRO becomes order independent.

CarliJoy commented 11 months ago

There will be no class C(A &B) so I don't know what you are getting at.

mark-todd commented 11 months ago

I think broadly the problem this is trying to solve is the same as that of the whole "LSP" business. If A and B are compatible from the LSP condition then:

class C(A,B):
   pass

is identical to:

class C(A&B):
   pass

Ah, except in the case of methods of the same name I believe

CarliJoy commented 11 months ago

x: A & B will be always satisfy isinstance(x, A) and isinstance(x, B) (or isinstance(x, A&B)).

You need to be very exact here, because we need to differentiate between A and B being concrete types or procotolls.

I would rather not open up the can of worm around allowing class C(A &B) within this PEP. It is complicated enough already.

I would suggest creating another PEP for this Issue once that Intersection PEP is passed.

mark-todd commented 11 months ago

I would rather not open up the can of worm around allowing class C(A &B) within this PEP. It is complicated enough already.

I would suggest creating another PEP for this Issue once that Intersection PEP is passed.

Yeah I fully agree with both of these points - the more we can reduce the scope of this PEP the better really.

That's not to say this syntax wouldn't be useful - I actually think it might be very useful if it worked, but I think it poses many more questions than it answers at the moment.

randolf-scholz commented 11 months ago

@CarliJoy We have the following issue:

  1. We want isinstance(x, A & B)isinstance(x, A) and isinstance(x, B) https://github.com/CarliJoy/intersection_examples/issues/10
  2. Generally, a naively implemented type C(A,B) is not a (structural) subtype of A & B, since multiple inheritance is order-dependent, but intersection is not. In fact, type-checkers currently do not show all LSP violations: https://github.com/python/mypy/issues/15790
  3. At runtime, isinstance(x, A) and isinstance(x, B) will always be true if isinstance(x, C), since it considers it a nominal subclass of both A and B (both doesn't check structural compatibility)

This creates an inconsistency that needs to be resolved. Without intersections, C(A,B) cannot be generally be type-hinted in a LSP compatible manner (again, see the mypy issue). This means that there is possibly tons of code out there that has formally incorrect type hints and possibly bugs. The issue class C(A&B) solves is that:

But I agree that possibly this is for another PEP.

randolf-scholz commented 11 months ago

@mark-todd

The mro method must return a list of classes in their resolved inheritance order - if A & B are commutative this is no longer resolvable in this structure - it would have to be something like: [<class '__main__.C'>, set(<class '__main__.A'>, <class '__main__.B'>), <class 'object'>]

I don't think that would be necessary at all. In fact, I wouldn't touch the MRO computation, it should stay exactly the same. The only thing that's needed is to validate that the MRO is order independent with respect to A and B, something along the lines of

for name in members(C):
    if name in namespace:  # C overwrites name
        continue
    if hasttr(A, name):
        assert not hasttr(B, name) or (getattr(A, name) is getattr(B, name))
    if hasattr(B, name):
        assert not hasttr(A, name) or (getattr(B, name) is getattr(A, name))
mark-todd commented 11 months ago

I suppose I wasn't so much saying that the mro's are impossible to compare - you're quite correct that I'm sure there exists a function that would let us show the two were equivalent. My point is more that the mro I kind of think of as way to look at the underbelly of inheritance. If A & B is to be commutative then:

class C(A&B):
   pass

Should have exactly the same mro as:

class C(B&A):
   pass

But this is impossible, without redefining what it means to be an mro, as either B or A must appear first. I have a feeling you're imagining that "under the hood":

class C(A&B):
   pass

becomes:

class C(A,B):
   pass

With some additional checks about the interchangeability of A and B - but my point was that this cannot be, because why could it not also be:

class C(B,A):
   pass

Hence, it must be something outside of the way inheritance currently works in python, and therefore an extremely large pep.

mikeshardmind commented 11 months ago

I don't think any of this is neccessary. Mypy having a bug with considering overloads that needs resolving is not an issue with the specification. pyright handles this case properly currently.

if class C(A, B) is a valid subclass without violating LSP, it will satisfy A & B, the same goes for C(B, A). If it violates LSP to create that class, the error arises earlier before the type ever is attempted to be used in an intersection.

We don't need to enforce any ordering in the intersections themselves, the class definitions themselves being invalid prior to reaching this resolves any issue.

randolf-scholz commented 11 months ago

@mark-todd Yes, in case C(A,B) you'd get a different MRO list than C(B,A), That's not what I mean with the MRO being order-independent. The point is that if you look up a method on C, you get the same result, independently of whether it is internally C(A, B) or C(B, A). This condition defines an equivalence relation on the set of MROs, and what C(A&B) could check at class creation time is whether the MRO satisfies this property, after which it doesn't matter if it internally is [C, A, B, object] or [C, B, A, object]. (as long as you don't monkey patch the parent types afterwards)

@mikeshardmind The issue https://github.com/python/mypy/issues/15790 is present in both mypy (playground) and pyright (playground).

from typing import overload

class X: ...
class Y: ...

class A:
    def foo(self, arg: X) -> X:
        return X()

class B:
    def foo(self, arg: Y) -> Y:
        return Y()

class C(A, B):
    @overload
    def foo(self, arg: X) -> X: ...
    @overload
    def foo(self, arg: Y) -> Y: ...
    def foo(self, arg: X | Y) -> X | Y:
        if isinstance(arg, X):
            return X()
        return Y()

This is an LSP violation, because if z is of type X & Y, then C.foo(z) should be X & Y as well, otherwise C cannot be safely substituted for B. It could for example be fixed by adding an overload def foo(self, arg: X & Y) -> Never and raising an error if an intersection type is given. Eric tested this and got over 2k errors in the mypy primer.

Given this produces so much noise, it may be that type-checkers are reluctant to perform this check. C(A&B) would allow being explicit and rigorous here.

mikeshardmind commented 11 months ago

This is an LSP violation, because if z is of type X & Y, then C.foo(z) should be X & Y as well, otherwise C cannot be safely substituted for B. It could for example be fixed by adding an overload def foo(self, arg: X & Y) -> Never and raising an error if an intersection type is given.

See the comments in the issue you linked, and that my original formulation of the statement included predicting that it's not present on valid subclasses that don't violate LSP. I'm extremely aware that people are violating LSP in a lot of places, and have been an advocate the type checkers should be doing more to detect these issues. However, that's frankly not our problem in this issue. Our problem is defining what intersection should do with valid type information, and what the subtyping rules are regarding intersection, as that is what needs a specification. If we start with a lie given to the type checker, the type checker has to assume it is true, it doesn't know otherwise. This also feeds into why false negatives are insidious. Not getting feedback about in error in one place can have a magnified scope by interacting with other things.

Maybe when users actually have the means to express the correct type here, it can be enabled for users correctly that this is checked.

gvanrossum commented 11 months ago

@randolf-scholz There is a markup failure in the initial comment here (missing backtick in the first 'ii'). Also it refers to "(1)" which is ambiguous (is it the first checkbox or 'i')?

Separately, discussing the behavior of isinstance() / issubclass() is somewhat fraught, since these require concrete types at runtime.

randolf-scholz commented 11 months ago

@gvanrossum I updated the OP.

Separately, discussing the behavior of isinstance() / issubclass() is somewhat fraught, since these require concrete types at runtime.

Hasn't that ship sailed with PEP 604?

What I wanted to point out in this thread is the following conflict:

  1. The only reasonable choice is to assume that C(A,B) is always a subtype of A and B, otherwise issubclass(C, A&B)issubclass(C, A) and isssubclass(C,B) does not work. (https://github.com/CarliJoy/intersection_examples/issues/10)
  2. Under this assumption, tons of existing code with multiple inheritance is formally incorrectly typed, since properly type hinting C(A,B) to ensure C <: A&B generally requires overloads involving intersections.
  3. Given the prevalence of the issue (again ~2.3k errors in mypy primer if #15790 was fixed), type-checkers may be reluctant to address the issue.
  4. Introducing a new way of subclassing C(A&B) would give a new, more explicit way to express the intent to create a subtype of A&B. Type-checkers can roll out stricter checks in such cases, and one can even add runtime checks that ensure the diamond problem is resolved in a way that ensures commutativity of A and B.
mikeshardmind commented 11 months ago
  1. No, C(A, B) is only a valid subtype of A and B when LSP is not violated. It's worth noting that we need to remove any isinstance/issubclass behavior from this draft. these are type guards, and their use creating an intersection or not is entirely up to type checkers. We only need to specify the subtyping beahvior and rules.
  2. Yes, a lot of code is typed incorrectly.
  3. I don't see how that's our problem. If there's that many cases of LSP violations that type checkers aren't warning them about, they may actually want to be aware they have brittle code or types. We should be aiming to create a useful specification, and let people fix any errors that are detected as a result.
  4. This would only work for concrete classes with disjoint interfaces, otherwise the order of the intersection becomes important again. For cases that fit that, swap & for , and it works today.
mikeshardmind commented 9 months ago

Tentatively closing this, as the consistency based rule appears to work appropriately for the concerns raised here both with nominal and structural subtyping. Please reopen if you have a reason it does not.