Closed randolf-scholz closed 9 months ago
Copied from other issue:
For the mro method of a class to be backwards compatible there has to be a defined order:
>>> class A:
... pass
...
>>> A.mro()
[<class '__main__.A'>, <class 'object'>]
>>> class B:
... pass
...
>>> class C(A,B):
... pass
...
>>> C.mro()
[<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>]
The mro method must return a list of classes in their resolved inheritance order - if A & B are commutative this is no longer resolvable in this structure - it would have to be something like:
>>> class C(A & B):
... pass
...
>>> C.mro()
[<class '__main__.C'>, set(<class '__main__.A'>, <class '__main__.B'>), <class 'object'>]
A & B
.At runtime, A & B
should create an instance of IntersectionType
, just like A|B
creates an instance of UnionType
. It should be possible to implement IntersectionType
in a way that allows us to subclass it. The advantage is that extra checks can be added that raise a TypeError
if the diamond problem is present:
class A:
def foo(self): ...
class B:
def bar(self): ...
class C:
def foo(self): ...
class A_and_B(A & B): # ✅ no issue
...
class A_and_C(A & C): # ❌ errors at class definition time, diamond problem!
...
class A_and_C_compat(A & C): # ✅ Ok, since diamond problem resolved.
def foo(self): ...
These checks should guarantee that the MRO becomes order independent.
There will be no class C(A &B)
so I don't know what you are getting at.
I think broadly the problem this is trying to solve is the same as that of the whole "LSP" business. If A and B are compatible from the LSP condition then:
class C(A,B):
pass
is identical to:
class C(A&B):
pass
Ah, except in the case of methods of the same name I believe
x: A & B
will be always satisfy isinstance(x, A) and isinstance(x, B)
(or isinstance(x, A&B)
).
You need to be very exact here, because we need to differentiate between A
and B
being concrete types or procotolls.
I would rather not open up the can of worm around allowing class C(A &B)
within this PEP. It is complicated enough already.
I would suggest creating another PEP for this Issue once that Intersection PEP is passed.
I would rather not open up the can of worm around allowing
class C(A &B)
within this PEP. It is complicated enough already.I would suggest creating another PEP for this Issue once that Intersection PEP is passed.
Yeah I fully agree with both of these points - the more we can reduce the scope of this PEP the better really.
That's not to say this syntax wouldn't be useful - I actually think it might be very useful if it worked, but I think it poses many more questions than it answers at the moment.
@CarliJoy We have the following issue:
isinstance(x, A & B)
⟺ isinstance(x, A) and isinstance(x, B)
https://github.com/CarliJoy/intersection_examples/issues/10C(A,B)
is not a (structural) subtype of A & B
, since multiple inheritance is order-dependent, but intersection is not. In fact, type-checkers currently do not show all LSP violations: https://github.com/python/mypy/issues/15790isinstance(x, A) and isinstance(x, B)
will always be true if isinstance(x, C)
, since it considers it a nominal subclass of both A
and B
(both doesn't check structural compatibility)This creates an inconsistency that needs to be resolved. Without intersections, C(A,B)
cannot be generally be type-hinted in a LSP compatible manner (again, see the mypy
issue). This means that there is possibly tons of code out there that has formally incorrect type hints and possibly bugs. The issue class C(A&B)
solves is that:
C
as a structural subtype of A&B
, and validates MRO independence at class creation timeA
or B
and suddenly creating a conflict.C
, avoiding false negatives.But I agree that possibly this is for another PEP.
@mark-todd
The mro method must return a list of classes in their resolved inheritance order - if A & B are commutative this is no longer resolvable in this structure - it would have to be something like:
[<class '__main__.C'>, set(<class '__main__.A'>, <class '__main__.B'>), <class 'object'>]
I don't think that would be necessary at all. In fact, I wouldn't touch the MRO computation, it should stay exactly the same. The only thing that's needed is to validate that the MRO is order independent with respect to A
and B
, something along the lines of
for name in members(C):
if name in namespace: # C overwrites name
continue
if hasttr(A, name):
assert not hasttr(B, name) or (getattr(A, name) is getattr(B, name))
if hasattr(B, name):
assert not hasttr(A, name) or (getattr(B, name) is getattr(A, name))
I suppose I wasn't so much saying that the mro's are impossible to compare - you're quite correct that I'm sure there exists a function that would let us show the two were equivalent. My point is more that the mro I kind of think of as way to look at the underbelly of inheritance. If A & B is to be commutative then:
class C(A&B):
pass
Should have exactly the same mro as:
class C(B&A):
pass
But this is impossible, without redefining what it means to be an mro, as either B or A must appear first. I have a feeling you're imagining that "under the hood":
class C(A&B):
pass
becomes:
class C(A,B):
pass
With some additional checks about the interchangeability of A and B - but my point was that this cannot be, because why could it not also be:
class C(B,A):
pass
Hence, it must be something outside of the way inheritance currently works in python, and therefore an extremely large pep.
I don't think any of this is neccessary. Mypy having a bug with considering overloads that needs resolving is not an issue with the specification. pyright handles this case properly currently.
if class C(A, B) is a valid subclass without violating LSP, it will satisfy A & B, the same goes for C(B, A). If it violates LSP to create that class, the error arises earlier before the type ever is attempted to be used in an intersection.
We don't need to enforce any ordering in the intersections themselves, the class definitions themselves being invalid prior to reaching this resolves any issue.
@mark-todd Yes, in case C(A,B)
you'd get a different MRO list than C(B,A)
, That's not what I mean with the MRO being order-independent. The point is that if you look up a method on C
, you get the same result, independently of whether it is internally C(A, B)
or C(B, A)
. This condition defines an equivalence relation on the set of MROs, and what C(A&B)
could check at class creation time is whether the MRO satisfies this property, after which it doesn't matter if it internally is [C, A, B, object]
or [C, B, A, object]
. (as long as you don't monkey patch the parent types afterwards)
@mikeshardmind The issue https://github.com/python/mypy/issues/15790 is present in both mypy
(playground) and pyright
(playground).
from typing import overload
class X: ...
class Y: ...
class A:
def foo(self, arg: X) -> X:
return X()
class B:
def foo(self, arg: Y) -> Y:
return Y()
class C(A, B):
@overload
def foo(self, arg: X) -> X: ...
@overload
def foo(self, arg: Y) -> Y: ...
def foo(self, arg: X | Y) -> X | Y:
if isinstance(arg, X):
return X()
return Y()
This is an LSP violation, because if z
is of type X & Y
, then C.foo(z)
should be X & Y
as well, otherwise C
cannot be safely substituted for B
. It could for example be fixed by adding an overload def foo(self, arg: X & Y) -> Never
and raising an error if an intersection type is given. Eric tested this and got over 2k errors in the mypy primer.
Given this produces so much noise, it may be that type-checkers are reluctant to perform this check. C(A&B)
would allow being explicit and rigorous here.
This is an LSP violation, because if z is of type X & Y, then C.foo(z) should be X & Y as well, otherwise C cannot be safely substituted for B. It could for example be fixed by adding an overload def foo(self, arg: X & Y) -> Never and raising an error if an intersection type is given.
See the comments in the issue you linked, and that my original formulation of the statement included predicting that it's not present on valid subclasses that don't violate LSP. I'm extremely aware that people are violating LSP in a lot of places, and have been an advocate the type checkers should be doing more to detect these issues. However, that's frankly not our problem in this issue. Our problem is defining what intersection should do with valid type information, and what the subtyping rules are regarding intersection, as that is what needs a specification. If we start with a lie given to the type checker, the type checker has to assume it is true, it doesn't know otherwise. This also feeds into why false negatives are insidious. Not getting feedback about in error in one place can have a magnified scope by interacting with other things.
Maybe when users actually have the means to express the correct type here, it can be enabled for users correctly that this is checked.
@randolf-scholz There is a markup failure in the initial comment here (missing backtick in the first 'ii'). Also it refers to "(1)" which is ambiguous (is it the first checkbox or 'i')?
Separately, discussing the behavior of isinstance()
/ issubclass()
is somewhat fraught, since these require concrete types at runtime.
@gvanrossum I updated the OP.
Separately, discussing the behavior of isinstance() / issubclass() is somewhat fraught, since these require concrete types at runtime.
Hasn't that ship sailed with PEP 604?
What I wanted to point out in this thread is the following conflict:
C(A,B)
is always a subtype of A
and B
, otherwise issubclass(C, A&B)
⟺ issubclass(C, A) and isssubclass(C,B)
does not work. (https://github.com/CarliJoy/intersection_examples/issues/10)C(A,B)
to ensure C <: A&B
generally requires overloads involving intersections.C(A&B)
would give a new, more explicit way to express the intent to create a subtype of A&B
. Type-checkers can roll out stricter checks in such cases, and one can even add runtime checks that ensure the diamond problem is resolved in a way that ensures commutativity of A
and B
.&
for ,
and it works today.Tentatively closing this, as the consistency based rule appears to work appropriately for the concerns raised here both with nominal and structural subtyping. Please reopen if you have a reason it does not.
We need to discuss how intersections relates to, and integrates with multiple inheritance. Points to discuss:
class C(A, B)
always be considered a subtype ofA & B
?C
with bothA
andB
.issubclass
always considersC
a subclass of bothA
andB
, since it is a nominal subtype of both. This leads to a conflict with (i) if we also wantisinstance(x, A&B)
⟺isinstance(x, A) and isinstance(x, B)
, sinceC
might fail to be a structural subtype ofA&B
.C(A,B)
as a subtype ofA&B
generally requires overloads involving intersections. Since these are currently missing, there is lots of formally incorrectly annotated code. This has lead to type-checkers allowing certain false negative LSP violations: https://github.com/python/mypy/issues/15790. How big is the problem? Eric tested against the mypy-primer and found over 2.3k errors.class C(A & B)
Regarding ①, I still think considering
C(A, B)
always as a subtype ofA&B
is the reasonable choice, type-checkers can raise errors inside the class body ofC
at places where it violates structural subtyping. However, what I worry is that type-checkers will shy away from implementing https://github.com/python/mypy/issues/15790, since tons and tons of legacy code will be incorrectly typed. This is whereC(A&B)
could come into play, with 2 benefits: Type-checkers can, without raising tons of errors in existing code bases, perform stricter checking inside such classes, and secondly, it can provide a second layer of security by testing for the diamond problem at class creation time.