Closed mikeshardmind closed 7 months ago
Firstly, thanks @mikeshardmind for formulating this - it's really cool to see a little idea I had being rigorously explored! Just have a syntax question really, which I mentioned before but I think it's worth discussing again - how do we differentiate instances vs classes?
# In current python
class A:
pass
x: A = A()
y: type[A] = A
In the new syntax
class A:
pass
class B:
pass
class C(B,A):
pass
x: type[B,A] = C
x = C() # How do we express type of this?
# Would it instead be:
x: type[B,A] = C()
x: type[type[B,A]] = C # This syntax seems kind of strange
The term type is used to say, "this follows the metaclass of type
" I believe, so I'm not sure we can use object
for instances. Am I missing something?
x = C() # How do we express type of this?
This would be fine:
typ_x: type[B, A] = C
x: object[B, A] = C()
# also fine:
x: object[B, A] = typ_x()
But I believe that in that particular case, it should just be left alone (To inference) or typed as C.
it becomes more clearly needed in a function signature
def some_constructor(typ: type[*Ts]) -> object[*Ts]:
return typ()
leaving out some details there of course
Thanks for raising my attention to this on discord.
I don't have much to add to this. I would be happy with this as an outcome. +0 on using object this way, it's better that it's a builtin and it doesn't need to be an import, but I don't feel as strongly as you do on it being a natural choice.
Allowing reordering when trivial to the type structurally seems pragmatic to nominal typing use, but feels like this is just a clever formulation that hides the differences the type checker cares about being different from what runtime cares about. I think that's okay here since the behavior appears to be fully consistent with runtime and describes everything I think matters at static analysis time as well, but this needs to be something people seeing this don't gloss over.
@mikeshardmind
This would be fine:
typ_x: type[B, A] = C
x: object[B, A] = C()
# also fine:
x: object[B, A] = typ_x()
I follow your idea here - I like the idea of saving an import, but there are I think some points to consider:
type[X]
interpreted to mean what it does in python?In general I think (correct me if I'm wrong here), that type[X]
implies a class inheriting from X
made from the metaclass type
, so I guess this means we'll need to consider expanding the interface of other metaclasses too:
class B(type):
pass
class C(metaclass=B):
pass
class D(metaclass=B):
pass
class E(C,D):
pass
x: B[C,D] = E
My point is not so much that I think metaclasses are a super common use case, but just that we should make sure the new syntax aligns with the way python already works.
Under this syntax the following two are equivalent: type[object[A,B]]
vs type[A,B]
. Is this a problem? Not sure - just wanted to point it out.
Also, object[T]
is now the same as T
.
object
One thing I'm not sure of is what the new syntax might mean for object
. At first object
being an instance feels quite logical, but I'm wondering if we're missing something here. object
is the base of all classes in the mro, so if you wanted to say "this is a class", it's quite natural to say:
type[object]
is a class that inherits from anything. But suddenly we could say:
type[A, object]
(must inherit from A and object)
or even object[A,object]
(instance of something that inherits from A and object).
It might be that there are easy solutions to these, or things I'm missing - but it seems that a separate keyword would be a lot less complex. We could always go the route of Union
and make it use builtins later down the line if something was worked out.
@mark-todd
Why is it type[X]?
type[T] already works this way today. Code sample in pyright playground
And also:
>>> class X:
... pass
...
>>> isinstance(X, type)
True
The only new things here are expanding this to allow multiple type parameters / typevartuples, defining what that means, and providing a way to specify when you want the instance behavior by expanding object.
@mark-todd
Why is it type[X]?
type[T] already works this way today. Code sample in pyright playground
And also:
>>> class X: ... pass ... >>> isinstance(X, type) True
The only new things here are expanding this to allow multiple type parameters / typevartuples, defining what that means, and providing a way to specify when you want the instance behavior by expanding object.
@mikeshardmind
Sorry I realise now my heading was slightly misleading, but want I meant was "why is type[X]
interpreted to mean what it does in python?", and the answer is that I think this is about the metaclass of X.
Update: I've investigated further and found it does not depend on the metaclass at all, so we're alright on that front - the below works:
class B(type):
pass
class C(metaclass=B):
pass
x: type[C] = C
Ah okay, then I should clarify in a bit more detail on a couple things you said.
Yes, object[T]
would be equivalent to T
, there's really no way around this (even if we used a different name than object) that doesn't have special casing implications, we need a way to spell SomeName[T, U]
as one type, and it should work with TypeVarTuples ideally as well to compose properly with generics.
My point is not so much that I think metaclasses are a super common use case, but just that we should make sure the new syntax aligns with the way python already works.
The relationship between type and object is actually a little different, I showed a little bit of it above, but classes (Rather than instances of them) are type objects and are usually often valid as a metaclass. I'm not sure where this is specified for the type system in a pep at this point in time, but it is mentioned here https://docs.python.org/3/library/typing.html#the-type-of-class-objects
Yes, object[T] would be equivalent to T, there's really no way around this (even if we used a different name than object) that doesn't have special casing implications, we need a way to spell SomeName[T, U] as one type, and it should work with TypeVarTuples ideally as well to compose properly with generics.
I agree this is unavoidable whichever name we choose - I guess I was wondering if giving specifically object
this property causes issues or lack of clarity, as it means:
object[object] == object
while: Inherit[object] == object
seems clearer (not wedded to this name, just using as a placeholder)
Also Inherit[A, object]
seems clearer than object[A, object]
.
Also by using a different name it has the nice property that, as with all other classes, type[X]
can be the class while X
is the instance.
Consider:
T = TypeVar("T")
def f(o: T) -> type[T]:
...
# Now suppose we have:
x: object[A,B] = C()
y = f(x)
The type of y
here would be type[A,B]
- but I'm not sure about this. For any other class x's type just gets wrapped in type
- why should this behave differently?
If instead:
x: Inherit[A,B] = C()
y = f(x)
The type of y can be type[Inherit[A,B]]
and it still has this property.
You could also consider the inverse function of this which I think has a similar issue.
I agree this is unavoidable whichever name we choose - I guess I was wondering if giving specifically object this property causes issues or lack of clarity, as it means:
object[object] == object
Seems fine to me. When I had the idea to use object, it was from the relation between type and object.
even the example you posed that mirrors with existing behavior of type, for instance, type
(runtime) has a type (type checking) of type[type]
x: type[type] = type # This type checks, and is correct today
y: object = object() # this type checks, and is correct today
z: object[object] = object() # in theory, this should be fine too.
It'll only look strange in cases where people have the single form, in which we can say that people should prefer not wrapping in object[]
when un-needed.
I wouldn't want to special case and preclude that though since the equivalence is trivial.
This could very easily be a linting rule in something that is opinionated about style including a code action in IDE or other autofix (see stuff like ruff), but we do need a corresponding way to refer to an instance, and requiring an import (and keep in mind that typing is considered expensive to import, and many people avoid doing so) for this when object is right here and a reasonable parallel seems like it makes this worse.
@mikeshardmind I'm pretty convinced by your points here:
x: type[type] = type # This type checks, and is correct today y: object = object() # this type checks, and is correct today z: object[object] = object() # in theory, this should be fine too.
The only case that still bothers me is this one:
The type of y here would be type[A,B] - but I'm not sure about this. For any other class x's type just gets wrapped in type - why should this behave differently?
and it's inverse func as below:
def f(o: type[T]) -> T:
...
x: type[A,B] = C()
y = f(x)
Type of y
here is object[A, B]
, which seems troubling to me - maybe it's just troubling because it's unfamiliar
This is an interesting idea. @mikeshardmind, thanks for working out the details and presenting the proposal.
Here are a few follow-up questions to help me clarify some points that I'm unclear on...
Are you proposing that all of the type arguments used in this construct must be concrete? Or are type variables allowed? For example, is object[int, S, list[T]]
allowed if S
and T
are TypeVars? If type variables are allowed, how do you see that working? How would a constraint solver solve for them?
How does this construct differ from a type defined with the existing class
statement? Am I correct in understanding that class AB(A, B): ...
is effectively describing the same type as type AB = object[A, B]
in your proposal? Is object[A, B]
just a shorthand way to express the same thing? If so, what problem does it solve over the existing class definition syntax? The class
definition admittedly has some limitations and runtime overhead, but it has the added benefit of already existing in the language, and type checkers already know a lot about class
statements, providing checks for things like metaclass conflicts, ambiguous MRO ordering, incompatible overrides, attempts to subclass final classes, etc.
From a naming perspective, I think that object
and type
would probably be pretty confusing, but we can figure out a different name if we move forward with this construct.
Are you proposing that all of the type arguments used in this construct must be concrete? Or are type variables allowed? For example, is object[int, S, list[T]] allowed if S and T are TypeVars? If type variables are allowed, how do you see that working? How would a constraint solver solve for them?
Type Variables and typevar tuples are allowed, but type checkers are allowed to say they don't know that they represent if there isn't an appropriate corresponding use. I have not written out all of the places I would expect this to be solvable yet, but this is definitely a question that needs at least a minimum set of detectable specified even if we leave an upper limit wider in case of type checker advances. I can work on better specifying this with places I expect it to work and not, but I think it's fine to leave this with similar rules to existing type vars: there needs to be a clear place the types enter the system at some point to tie it all together. I'd really prefer to leave this as much in the ball court of type checkers as possible with how much they deem they can infer. I would state though that because this is intended to flexibly work with other non-specified bases, it's probably not reasonably inferable from "there are no other bases" I'll see about firming this up.
At assignment, it is sufficient to check that each type is individually satisifed because creating the type will handle checking that creating the type is valid. From the use side, it's "safe to use" what's provided by the minimum bound.
object[int, S, list[T]]
... How would a constraint solver solve for them?
quick example:
def foo(some_list: list, x: object[int, S, list[T]]):
# type checker may choose to error or warn that it does not know what S or T, nothing provides that information here
# may alternatively chose to use the lowest known bound, object.
some_list[x] # this is fine, looking for `.__index__` we find it on int
for item in x: # `__iter__` is not found on int, not found on S, but found on list[T] without needing to specialize for T
print(item) # And the type being Iterator[T], and all types being printable allows this.
class Foo[S, T]:
def __init__(self, stype: S, ttype: T): ...
def foo(self, some_list: list, x: object[int, S, list[T]]):
# S and T are now resolvable in relation to something else (the class)
# but inside the body prior to specialization, may still only be known to the minimum bound, rest same as above
Other Examples:
class InplaceSortable(Protocol):
def sort(self):
...
def foo[T](x: object[Sequence[int], InplaceSortable, T]) -> object[Sequence[int], InplaceSortable, T]:
# type checker may error here, there's no appropriate way to infer what T is
# may indicate that the next example is what should actually be used
def foo_fixed[T: object[Sequence[int], InplaceSortable]](x: T) -> T:
...
class FooFineHere[T: SupportsIndex]:
def __init__(self, emitted_type: type[T]):
self.typ: T = emitted_type
self.some_list: list[T]
def foo(self, x: object[InplaceSortable, Sequence[T]])
# T fine here, inferable from class
x.sort() # this is fine
x.do_something() # not fine, nothing provides this
self.some_list[x[1]] # This works fine, even if it's nonsense for use as an example, bound of T
How does this construct differ from a type defined with the existing class statement? Am I correct in understanding that class AB(A, B): ... is effectively describing the same type as type AB = object[A, B] in your proposal? Is object[A, B] just a shorthand way to express the same thing? If so, what problem does it solve over the existing class definition syntax? The class definition admittedly has some limitations and runtime overhead, but it has the added benefit of already existing in the language, and type checkers already know a lot about class statements, providing checks for things like metaclass conflicts, ambiguous MRO ordering, incompatible overrides, attempts to subclass final classes, etc.
It is not the same.
class AB(A, B): ...
would be a valid choice for type[A, B]
, but this is the simplest case possible. class BA(B, A): ...
also works. it works with types that aren't known by the function itself (like with type variable), and it also should work with protocols.
This is entirely a type system side feature with the other checks you mention existing when someone tries to make the type that will be used here, so users don't lose out on those checks, but they will be raised to them in other places.
The lookups are intentionally defined to be ordered in the same way mro would, but to allow levels of indirection not specified to keep both reasoning about this simple, as well as implementation.
Some examples showing a few of the ways this is different:
class A:
...
class B:
...
class ABRealized1(A, B):
...
class ABRealized2(B, A):
...
class C(B):
...
class ABRealized3(A, C):
...
type NeedsAB = object[A, B]
# all of these are fine.
# These are all instances of types which are all nominal subclasses of all of what is specified, LSP verified in respective definitions.
x: NeedsAB
x = ABRealized1()
x = ABRealized2()
x = ABRealized3()
# I believe this is in here, but a protocol indicating supporting rich comparison if not
from useful_types import RichComparable
def total_ordering(typ: type[T]) -> type[RichComparable, T]: # see functools.total_ordering
...
# Any type wrapped in this is now known to be both a nominal subtype of the original type
# while also being a structural subtype of the protocol RichComparison.
@total_ordering
class Point:
def __eq__(self, other) -> bool: ...
def __lt__(self, other) -> bool: ...
p1: Point
p2: Point
p1 >= p2 # fine
OK, I think I understand. To summarize: A class
statement defines a specific class, but the type[A, B]
construct describes any class that is a subtype of A
and B
regardless of the class's name or base class ordering. Is that an accurate summary?
I think that implies... If either A
or B
are nominal classes (other than object
), then any class compatible with type[A, B]
must also be nominal. If both A
and B
are structural types (or one or both are object
), then a class compatible with type[A, B]
could be structural or nominal.
...users don't lose out on those checks, but they will be raised to them in other places.
That's a great point. I think that's really compelling because it eliminates the need to duplicate those complex checks and rules within the intersection type itself. Those errors will be easier to understand if they are surfaced at the point where such a type is defined (e.g. a class
statement that uses a @final
type as a base class).
Your answer about TypeVar usage makes sense, but I don't think you fully answered my question about constraint solving. I was asking about the case where a call expression targets a generic function or constructor whose signature includes a TypeVar within an object[A, T]
type. Here's an attempt to answer my own question. Does this sample look right to you?
class A: ...
class B: ...
class C: ...
class D(A, B): ...
class E(A, C): ...
def call_target[T](val: object[A, T]) -> T:
return val
def caller(d: D, de: D | E):
reveal_type(call_target(d)) # B ?
reveal_type(call_target(de)) # B | C ?
I can think of other more complex constraint solver examples that involve structural types, overlapping types, etc. Some of these don't have such obvious solutions, but similar complex (and unspecified) situations arise when solving TypeVars that are included in unions, so maybe it's OK to leave some of these details up to individual type checkers to figure out. For all the cases I can think of, I'm able to come up with some answer that is reasonable.
As for nomenclature, is there a reason we wouldn't want to use the term Intersection
for this concept? I realize that it's not identical to the set-theoretic intersection concept that you've been exploring in other discussion threads, but it is effectively an intersection type, right? That would allow us to spell it as Intersection[A, B]
and type[Intersection[A, B]]
(or A & B
and type[A & B]
).
What other limitations are there on types that can be included in this construct? You've mentioned that any class (nominal or structural) is allowed, as are Any
and TypeVars. What about unions, literal types, callables, Never
, None
, type[X]
? Are any (or all) of these disallowed?
OK, I think I understand. To summarize: A class statement defines a specific class, but the type[A, B] construct describes any class that is a subtype of A and B regardless of the class's name or base class ordering. Is that an accurate summary?
Mostly, yes. There's an example with non-disjoint types to consider though, and part of the reason the ordering does matter
class A
def foo(self) -> int:
...
class B:
def foo(self, some_arg: bool = True) -> int:
...
def foo(x: object[A, B]):
x.foo() # ok
x.foo(True) # not okay
we've specified A as first in order when resolving lookup at static analysis time. In this trivial case we can figure out that anything that subclasses A and B validly must allow this, but we're specifically leaving ordering to the user so that the more complcex cases where a type checker wouldn't be able to make a decision function properly for multiple use cases.
Type checkers may decide to note that swapping the order would fix it to give a helpful message, but should error here ( I mentioned this off-handedly in the part about obviousness of behavior and ergonomics)
I can think of other more complex constraint solver examples that involve structural types, overlapping types, etc. Some of these don't have such obvious solutions, but similar complex (and unspecified) situations arise when solving TypeVars that are included in unions, so maybe it's OK to leave some of these details up to individual type checkers to figure out. For all the cases I can think of, I'm able to come up with some answer that is reasonable.
Actually, that's a compelling reason to not use type
and object
for this. I wasn't actually anticipating that type checkers would infer type variables in this case, and I don't think your sample there presents something that type checkers should be left to try and make sense of.
obviously in the case of
def foo[T](typ: type[T]) -> T:
...
we want the current behavior
def foo[*Ts](typ: type[*Ts]) ->object[*Ts]:
...
In this case, is Ts the full mro of the type? Reasonable to do so, but is it useful? idk
def foo[T, U](typ: type[T, U]) -> object[T, U]:
...
This case looks fine at first, it's resolvable by having T resolve to the type passed in and U as object (or any other base in the mro)
But it invites the case in your sample you have here where there's no reasonable choice as any choice would inherently be lossy on type information, and there's no singular correct pick that doesn't require understanding the intent of the developer outside of the type system itself.
Here's an attempt to answer my own question. Does this sample look right to you? [Sample left out]
Not really, no. I'd say in both cases here there's not a reasonable choice for the type checker to go with here, the type checker isn't provided enough information to make the right decision unambiguously here, and I don't think that this should be destructuring the mro partially to make sense of what the user is trying to do, the user should need to find a less ambiguous way to write this. If the specific call site shown is the intent, there's no clear reason why this wouldn't be a typevar bound, on a singular type var.
From what the type checker should do, There's multiple answers which won't error, and at least one of these involves understanding all of the call sites, which may not be possible, especially if we consider libraries.
this could present some undesirable cases where the lack of inference doesn't feel consistent, even if there's a consistent underlying rule for the type checker, which may make this harder to teach, learn, and/ or otherwise adopt. I would say from this that it's safe to infer only from call site in the singular case, and in the plural case, there should probably be some other external information here.
As for nomenclature, is there a reason we wouldn't want to use the term Intersection for this concept? I realize that it's not identical to the set-theoretic intersection concept that you've been exploring in other discussion threads, but it is effectively an intersection type, right? That would allow us to spell it as Intersection[A, B] and type[Intersection[A, B]] (or A & B and type[A & B]).
A preference on avoiding the same split in the future that is currently happening with TypeGuards, and a preference to avoid inviting use of &
based on the background here.
I would be fine with OrderedIntersection
to leave Intersection
open for any potential future. People can import it as whatever name they want, but this feels like a forseeable case down the line if something prompts adding the full version we had previously been discussing, and especially with the prior section of this, I now think it's better than using type and object here and will add a footnote to the original post.
What other limitations are there on types that can be included in this construct? You've mentioned that any class (nominal or structural) is allowed, as are Any and TypeVars. What about unions, literal types, callables, Never, None, type[X]? Are any (or all) of these disallowed?
I have no reason to disallow any prior existing type construct here. The fun part about "it just needs to be all of these, and we have an order that the type checker should check for things in", is any error where it's impossible to make it will arise elsewhere when someone tries to make it and can't satisfy LSP or some other constraint of the type system. Literal types are the flimsiest of these, but since I can't see a reason to outright forbid it, I have to wonder if there is someone who will find a use. I also think that since users are responsible for sensible ordering in the case of any overlaps here, there's no additional complexity for type checkers. This should scale linearly with the number of elements in the ordered intersection.
There's an example with non-disjoint types to consider though
This is an interesting example because there would be no legal way to define a type with the order A
then B
.
class Good(B, A): # OK
...
class Bad(A, B): # Type error: A.foo overrides B.foo in an incompatible way
...
I think that any non-disjoint types are either illegal to combine (if the disjoint member is invariant) or demand a specific ordering (if the disjoint member is covariant).
Just brainstorming... Here are some other terms that might connote "intersection-like" without using the term "Intersection": Overlap, Shared, Joint, Conjunction, Common, Meet, Mutual.
I'm pretty bullish on this proposal. It strikes a good balance between functionality and complexity, and it appears to address most (all?) of the major use cases that were identified for intersections.
Thoughts on next steps? Maybe start writing a more complete spec that could serve as a first draft of a PEP?
I think that any non-disjoint types are either illegal to combine (if the disjoint member is invariant) or demand a specific ordering (if the disjoint member is covariant).
Hmm. I'm not sure I want this to be specified behavior as mandatory, as this may increase complexity for type checkers when it would already be detected when you try to create something that is that type, but I do think that's a reasonable way type checkers could go above and beyond for users and detect it earlier. Not sure here. If you think this is reasonable to detect at the site of the ordered intersection as well, I can include detection of that in writing up a more formalized draft, but this was something there wasn't good consensus on in the unordered form, and while much of the disagreement should go away with the ordering being part of the construct, I'm unsure if it's necessary.
Thoughts on next steps? Maybe start writing a more complete spec that could serve as a first draft of a PEP?
I should be able to spend time on that soon if there are no other major things to consider here. If we're mostly in agreement on the construct not having major discrepancies across use-cases, then we're down to defining it, specifying it well enough, potentially bikeshedding on a name, and presenting the argument for it to the wider python community as part of the PEP process.
If you think this is reasonable to detect at the site of the ordered intersection as well
No, that's not what I meant. Sorry for any confusion. I definitely don't think that it's reasonable to detect that at the site of the type annotation, and I wouldn't want to see that appear in the spec.
My point is that only some base class orders are legal in a real class definition. Both classes Good
and Bad
are type compatible with OrderedIntersection[A, B]
, but Bad
is not a legal class because it violates the LSP. Do I have that right?
My point is that only some base class orders are legal in a real class definition. Both classes Good and Bad are type compatible with OrderedIntersection[A, B], but Bad is not a legal class because it violates the LSP. Do I have that right?
Short answer, yes.
Slightly longer: it's still possible to resolve it even with that ordering (one of the cases against intersection detecting this in the unordered form from before, complexity in determining what the type would result in minimally)
class NoLongerBad(A, B):
def foo(self, some_arg: bool = True) -> int:
if some_arg:
return super().foo()
else:
return 1
def foo(x: OrderedIntersection[A, B]):
# but we still use the order we told static analysis to look for it here,
# so while x supports an arg, we can't use it here unless we reorder here.
x.foo(True) # still an error
x.foo()
foo(NoLongerBad()) # fine
Defering to user provided ordering can in some cases result in wider or narrower cases. The clear reason why deferring to that ordering isn't just to make type checking simpler shows up with Any
I think implementation wise I'm convinced - I agree with Eric that a separate name would be better than type
and object
, what that name is I have less strong opinions about.
Re names, I'm more in favour of single word names for brevity's sake - also I think OrderedIntersection
suggests the existence of Intersection
(like OrderedDict
), which may or may not actually ever exist. That said maybe avoiding using Intersection is a good point.
Overlap, Shared, Joint, Conjunction, Common, Meet, Mutual.
I think Overlap
seems decent
Other thoughts - I know I originally suggested Inherit
which seems unpopular, but I would also suggest other names that follow the idea that these are things that appear in this order in the mro
. So you could also have:
Mixin[A,B], Mixed[A,B], Mixing[A,B], MRO[A,B]
Inherit[A,B], Inherits[A,B], Inheritance[A,B], InheritsFrom[A,B]
Other related naming ideas based on the same concept:
Extends[A,B]
(this is my current favourite)
Derives[A,B], Derived[A,B], DerivesFrom[A,B]
Disclosure: I haven't read understood everything in detail yet.
But from a teaching/learning perspective using type[]
and object[]
seems to be a lot easier because it is much more similar in how type[T]
or list[str]
is used already.
I got a bit lost in why Problems with resolving somehting like def call_target[T](val: object[A, T]) -> T: ...
are related to a naming this type
or object
.
Wouldn't the problem not also exist for set based intersections
ala def call_target[T](val: A&T) -> type[T]: ...
?
I got a bit lost in why Problems with resolving somehting like def call_target[T](val: object[A, T]) -> T: ... are related to a naming this type or object .
@CarliJoy Just to clarify, there are really two distinct options presented in terms of names:
For the instance: Extends[A, B]
For the class: type[Extends[A,B]]
What that name is matters less - hence various options discussed - as it doesn't affect behaviour in any way, and you can do import as whatever you want. I've used Extends
as a placeholder but many other names are available :)
For the instance: object[A, B]
For the class: type[A, B]
This presents a quite difference interface - we've been discussing if this is a good interface or not mainly.
But from a teaching/learning perspective using type[] and object[] seems to be a lot easier because it is much more similar in how type[T] or list[str] is used already.
The type
part of this I completely agree on - it follows naturally from the existing use of type
. I would argue though that object[A,B]
is less natural as it has no current equivalent in python, which is why I prefer the separate name option.
Also we've discussed equivalences that come out of this, and whether these matter such as:
type[object[A,B]] == type[A,B]
object[A] == A
One thing I like about option 1 is that only one of these exists: Extends[A]
== A
@mikeshardmind I've been pondering a little thought experiment following our discussion on object[A]
or Extends[A]
being equivalent to A.
It'll only look strange in cases where people have the single form, in which we can say that people should prefer not wrapping in object[] when un-needed. I wouldn't want to special case and preclude that though since the equivalence is trivial.
This set me thinking - is there any precedent in python for banning single item type wrappers when multiple args is allowed? I discovered there is:
from typing import Union
class A:
pass
Union[A]
The above returns in pyright Union requires two or more type arguments
- even though a single item union could be trivially resolved to A
. So I think saying say Extends
must take two of more arguments would if anything follow more closely with what python already does, and we remove the equivalence Extends[A] == A
.
Pyright might not like it, but the Union itself allows it and in fact Union[A] == A
e.g. Union[int] == int
@tomasr8
Pyright might not like it, but the Union itself allows it and in fact Union[A] == A e.g. Union[int] == int
Oh really? That's interesting - so the type checker is more restrictive than the PEP. If the PEP for Union
says this is fine then I think similarly we can leave it up to the type checkers to ban it if they want to.
There are a couple issues being discussed on discourse that may effect the specific language required, specifically around Annotated and Literal. I'm advocating what I believe are sensible interpretations with those, but there are other interpretations which could cause needing special behavior for intersections with Annotated or Literal. If those come to pass before this is accepted, I'll need to do some extra work with specification here.
I'll try and wrap the draft specifying this up by the end of the week for review since there have been no major detractions from it in a week. I'm a little behind on this compared to my intent to have the draft ready today, my apologies to the couple people who asked about my timeline on this out of band.
I'm going to use the name OrderedIntersection
in the draft, we can bikeshed on a better name once it's the last thing left if people have concerns about the name, but I'd rather focus on getting the behavior specified clearly and make sure there are no issues with that
As someone who hasn't followed the nuanced / in-the-weeds discussion too well:
Would anyone be willing to summarize what might be wanted from a typing.Intersection
that is not addressed by this proposal? From my perspective, MRO seems like an essential part of typing
in Python, so an Intersection
that doesn't take into account ordering seems like it would be incomplete. 🙂 I may just be missing something though.
... requiring an import (and keep in mind that typing is considered expensive to import, and many people avoid doing so) for this when object is right here and a reasonable parallel seems like it makes this worse.
Also, my 2¢ on the import issue: IMO if folks are sensitive to typing import costs, then they should wrap type imports in if TYPE_CHECKING:
and annotate using strings.
Given the relative rarity of expected use cases for this (as evidenced by the fact that they don't exist yet 🙂), I'm 👎 on overloading the term object
for this. It just feels very counter-intuitive to me when so many other features of the type system so frequently rely on imports.
I vote typing.Intersection
, perhaps also with the &
operator.
@vergenzt
Would anyone be willing to summarize what might be wanted from a typing.Intersection that is not addressed by this proposal? From my perspective, MRO seems like an essential part of typing in Python, so an Intersection that doesn't take into account ordering seems like it would be incomplete. 🙂 I may just be missing something though.
I was unable to come up with any cases that did not require more than just Intersection being added. a mathematically pure intersection could find several uses with structural subtyping if paired with type negation, but it's a rather slim case that people have workarounds for involving overloads. That is not to say there won't be uses we can't foreee here with our current perspective, and I've knowingly made this "less powerful" than a true intersection to fit in the boxes of what we can define well and consistently with existing definitions, so using typing.Intersection
or &
when we know there is a potential future and a known blind spot based on only current needs seems to invite future problems of the same kind currently going on with TypeGuard
Speaking of which, actually writing a specification for this has been challenging. There's a lot of terms that I'm having to first define while remaining consistent with python's existing definitions. Some examples of this come up with how much the word "type" is overloaded in the current specification to mean multiple things with differing impact.
There's also the fact that this would cause X & Y
to be a different type from Y & X
. While we desire the ability to express these as separate concepts so that users can determine the correct behavior in case the type checker cannot know by definition (such as with Any as an operand), the ordering here and why it is needed is subtle. Calling attention to it in the name of the type form itself helps prevent this from becoming a gotcha.
There's also the fact that this would cause X & Y to be a different type from Y & X. @mikeshardmind
Yeah I agree this would be strange I think.
Had a slightly radical idea re syntax, what about:
Class: x: type[A,B]
Instance: x: [A,B]
This has the advantage that it follows the same pattern as everything else, with the type
keyword.
For a moment, I want to put my own opinions on what the best syntax might be aside and consider this from another angle.
If we consider the problem only from the point of view of "what's most likely to be accepted?" I think the separate word (typing.Intersection
/typing.OrderedIntersection
etc) stands much more chance than changing type
and object
(or indeed the syntax I suggested above). The people who want intersection will be split across the two opinions (as we are finding), but we also have to consider the people that don't care about intersection. These people will want the least impact to the existing system, and having another item seems much less controversial.
I also think it's worth noting that where possible most features like this use a separate word as well so that it can be included in typing-extensions and allow for backwards compatibility.
It's a well trodden path for things to go from being a separate import to having built-in syntax, take Dict
-> dict
, List
-> list
, Union
, most recently ParamSpec
. I think there's a good reason for this, which is that having a separate keyword is an easier sell. Once the keyword is established, then discussions begin on how to make the built-ins behave the same way. Trying to do both at the same time feels a bit like jumping the gun, and I think will make it more difficult to get this PEP accepted.
There's also the fact that this would cause
X & Y
to be a different type fromY & X
. While we desire the ability to express these as separate concepts so that users can determine the correct behavior in case the type checker cannot know by definition (such as with Any as an operand), the ordering here and why it is needed is subtle. Calling attention to it in the name of the type form itself helps prevent this from becoming a gotcha.
This was super helpful for understanding the issue. Thanks. I see now why you don't want to call it an "Intersection": because the word "Intersection" might give a mistaken expectation of commutativity of the arguments when they don't in fact commute.
This contrasting commutativity between Unions vs the proposed spec for Intersections is making me think about a parallel to sets and dicts.
E.g. set(*a, *b, *c)
is commutative (a
, b
, and c
can be in any order and the result is the same), while dict(**a, **b, **c)
is not commutative if they share any keys. (dict(**a, **b, **c)
may give you a different result than dict(**c, **b, **a)
.)
The similarities may end there (I'm not sure), but seems like interesting food for thought.
What about typing.All[*Ts]
as a name for this concept? I.e. indicating a type that must satisfy all of the named types? Seems to me to convey less of an expectation of commutativity than "Intersection" might, given that the boolean equivalent isn't necessarily commutative due to short-circuiting. (Note: I know the result of a boolean all
call is the same regardless of order, but side-effects could be different since the predicate may be invoked on a different set of instances, so it's not the same.)
dict(a, b, c) is not commutative if they share any keys. (dict(a, b, c) may give you a different result than dict(c, b, **a).)
It's an interesting comparison - I think there's a good reason for the second point if we consider the annotations accessible in each:
class A:
foo: int
bar: str
class B:
bar: float
class C(B,A):
pass
class D(A,B):
pass
print(A.__annotations__)
print(B.__annotations__)
print({**A.__annotations__, **B.__annotations__}) # C
print({**B.__annotations__, **A.__annotations__}) # D
Here the results are representative of the attributes available in C and D
What about typing.All[*Ts] as a name for this concept? I.e. indicating a type that must satisfy all of the named types? Seems to me to convey less of an expectation of commutativity than "Intersection" might, given that the boolean equivalent isn't necessarily commutative due to short-circuiting. (Note: I know the result of a boolean all call is the same regardless of order, but side-effects could be different since the predicate may be invoked on a different set of instances, so it's not the same.)
I see where you're going with this idea - I think maybe it's better to avoid overlap with the existing all
. Personally I think whatever name we go for should probably be clearly distinct from existing language points. That said, I did wonder about extending the use of the existing Concatenate
, but similarly I think it might make it too confusing.
Also any
and all
are kind of related - this would of imply that typing.Any
and typing.All
are somehow related which may be also unwise...
Extends
is still my favourite
I'm now starting to wonder if I dismissed the Concatenate
idea too quickly - at the moment it's used to combine ParamSpec variables with Arguments - is there any reason why it couldn't be extended to become "Intersection"?
https://docs.python.org/3/library/typing.html#typing.Concatenate
What about typing.All[*Ts] as a name for this concept? I.e. indicating a type that must satisfy all of the named types? Seems to me to convey less of an expectation of commutativity than "Intersection" might, given that the boolean equivalent isn't necessarily commutative due to short-circuiting. (Note: I know the result of a boolean all call is the same regardless of order, but side-effects could be different since the predicate may be invoked on a different set of instances, so it's not the same.)
Ok another idea bouncing off @vergenzt 's thought on typing.All
. If we're going to conceptually relate all
with intersection, we could go the whole way and say:
class: type[all[A, B]]
instance: all[A, B]
This does suffer this issue I mentioned earlier of being a "harder sell". Also just noticed something:
a: list[A] = list([A()])
a: all[A, B] /= all([A(), B()])
I think this probably means we don't want to go down this road - just wanted to follow the thought through to the end.
Had a slightly radical idea re syntax, what about:
Class: x: type[A,B] Instance: x: [A,B]
This has the advantage that it follows the same pattern as everything else, with the type keyword.
I don't like this for stylistic reasons, but even beyond that, this probably wouldn't work, as this would get very muddy with runtime use of the type system. That expression is a list literal with types as it's items.
Other quick name notes:
Extends
: When I think about extends, I actually am reminded that unlike python's MRO, the languages I know that use this, the order isn't allowed to matter or even to extend from multiple. For instance Java's use of extends only allows direct subclassing, Kotlin then has syntax sugar to allow multiple, but multiple class inheritance is prohibited while multiple interfaces may be implemented by a single class. They don't have the diamond pattern we're avoiding here because they just don't allow that kind of subclassing. I'm not sure using it would be a problem, but I wonder if using this is going to lead to initial confusion for people.
All
: While I see the parallel, I don't think it's a good name for this and would prefer not to use it when there could later be other things in the type system that would use this name more effectively. If refinement types are ever added to the type system, All
and Any
might be desired to chain together statically analyzable predicates. We can pick a name more conceptually close to what we are doing.
OrderedIntersection
: It's more verbose (good) but takes up more horizontal space. I think this is "okay" with type alias syntax. I'm not stuck with this as the only option, but it's effective and descriptive of what we want the behavior to be.
I'm now starting to wonder if I dismissed the Concatenate idea too quickly - at the moment it's used to combine ParamSpec variables with Arguments - is there any reason why it couldn't be extended to become "Intersection"?
There's very little friction to adding a new type form, and not enough reason to overload the behavior of concatenate IMO. This could lead to hard to read type signatures if an intersection involves multiple callables that use concatenate themselves.
Extends: When I think about extends, I actually am reminded that unlike python's MRO, the languages I know that use this, the order isn't allowed to matter or even to extend from multiple. For instance Java's use of extends only allows direct subclassing, Kotlin then has syntax sugar to allow multiple, but multiple class inheritance is prohibited while multiple interfaces may be implemented by a single class. They don't have the diamond pattern we're avoiding here because they just don't allow that kind of subclassing. I'm not sure using it would be a problem, but I wonder if using this is going to lead to initial confusion for people.
I think I got the idea for this from a document on diagram structures, that I used it for multiple inheritance. It's also used by scss for multiple inheritance I believe.
OrderedIntersection: It's more verbose (good) but takes up more horizontal space. I think this is "okay" with type alias syntax. I'm not stuck with this as the only option, but it's effective and descriptive of what we want the behavior to be.
I'm fine with this as a name but I do think it's a bit long - I wondered about OIntersection
.
I think it'd be worth having a call with anyone who wants in to brainstorm and pick a name if that's the only outstanding thing - it's not worth discussing for aeons, but I think a quick discussion wouldn't hurt. If OrderedIntersection
is the most descriptive name we can come up with after that I think let's just go with that.
Also I just went through the thread and gathered up all the possible names that have been discussed - I'm excluding any that overlap with any existing names here: https://docs.google.com/spreadsheets/d/1rJlt5UfSOkVey2g5p634mKa0NdsCdpuZDwDfs4CnsiY/edit?usp=sharing
Feel free to add more if people have more thoughts
Thank you for the list mark. I would propose the following:
We use OrderedIntersection
in our PEP to get it done finally and we don't introduce the &
Operator for this, as we might need it for the "proper" Intersecion.
This way we reduce the bikeshedding.
Once the the PEP is handed in, we can create a survey in Typing on discuss python. During the PEP discussion process we still can change the name of the operator. But the title "Ordered Intersection" for the PEP itself seems just to be exact and catchy.
As mark pointed out, shortcuts like type[*Ts]
can still be introduced later one. I would suggest to rather not put energy into it yet.
Please react with 👍🏻 or 👎🏻 if you like/dislike this proposal on the progress.
Please react with 👍🏻 or 👎🏻 if you like/dislike this proposal on the progress.
Yeah, I think it's worth moving ahead with this in the way you describe - couple of points on the name I just wanted to mention, but happy to park the discussion.
As a little thought experiment, I took a look at the existing names in the typing module, to see what the longest ones were. Longest I've found were LiteralString
and TypeAliasType
both at 13 characters. OrderedIntersection
is 19 characters, which is quite a bit longer, but also it's not crazy long.
I did also think maybe it's worth noting how likely it is the regular unordered intersection could exist. They could just use UnorderedIntersection
for this one if it was ever figured out, and then we can use Intersection
. It's a little bit first come first serve in my mind, and a hypothetical where someone magically solves all the problems we've discovered seems extremely unlikely to me.
I also think the majority of people would just start their script from typing import OrderedIntersection as Intersection
which although not a huge deal seems a bit of a waste of time.
As I say though, if OrderedIntersection
is the best we can get for now I think it's fine as a placeholder.
This approach will not be possible to implement in a consistent manner without larger changes to the type system.
Those same changes would allow just not having an ordered intersection. I'll be following up with both a summary of my findings on this, as well as details and references to other related discussions at a later point in time.
@mikeshardmind Is there no way we can make a specification that allows for the special cases as they are currently? I'm judging from the other discussion that making the proposed changes to the way Any subclassing works won't be possible, but I don't think that necessarily means we have to scrap intersections altogether. So long as there is a defined way that it behaves can't we just make a special case in intersections for now? Perhaps I'm missing something more fundamental, but it seems to me that the above approach works for almost all the cases we've considered which seems like something still worth pursuing.
Another thing I've been considering is that it seems we're trying to resolve a lot of cases, some of which may not come up very much in practice. If we can determine exactly what these cases are, is there a way we can deliberately exclude them from the specification? While it may not be the most satisfying theoretically, I think perhaps an approach where we create an intersection that satisfies 99% of use cases is definitely the most practical. Once we have this "minimum viable intersection", there's no reason why this couldn't be expanded by some future pep to be more complete when other issues in the typing system are resolved.
There is no useful to users way to reconcile Any in an intersection with the current behavior type checkers have, and I consider the behavior of having Any as a base class to be fundamentally broken right now. I'm unwilling to put my name behind a proposal that makes this situation worse and harder to untangle in the future by layering on top of something that is in this state, and I'm unwilling to expend more effort on this proposal until a time when it can be done without these problems.
The "best case" for accepting intersections in the current state with what we explored and with how this needs to be done in a way that the underlying problems can be fixed without people saying "but we're relying on the current behavior" (see TypeGuard vs TypeIs for a simpler example of how fixing behavior isn't trivial) right now would be to have intersections that were T & Any is T & Any, unordered, not special-cased, and that type checkers should know that T places a theoretical lower bound on T & Any, but for now they are free to treat it as if it were Any.
That's fair enough, I respect your position on this. You've certainly put in a lot of effort already into trying to figure out a path forward with it - I'm quite keen for all our efforts not to go to waste, so I might at some point try and continue this forward in some form. Would you be happy if I use your draft for ideas on some parts in drafting a new PEP? I think there's a lot of good ideas here, and I like to think that there might be a way forward with it.
Note: This is An argument for the intentional non-implementation of pure intersections, and instead use all of what we worked out to implement an alternative option that satisfies all currently supportable use-cases, while leaving the door open to full intersections. This is not arguing to throw away everything we've worked on, but to use what we worked out to help our collective use cases, productively. I am actively inviting any constructive detraction to this, as if there is something these cannot do that are "must haves" to go this route would require determining if intersections are must-have to support those uses.
(edited) Note 2: Where you see
type[*Ts]
and object[*Ts]it may make more sense to have
OrderedIntersection` There's more details at the end on this nowWith Thanks to @DiscordLiz , @erictraut , @gvanrossum, @mark-todd, and @NeilGirdhar in alphabetical order (and no other order of particularity) for presenting each extremely strong, and opinionated arguments as well as some feedback that led to some self-reflection on why I felt so strongly about particular outcomes of this, I believe that there is a strong argument that without other constructs (such as negation, pick, and omit), there is not a good argument for the inclusion of the theoretical intersection, and that even with those, it might only make sense for that to work with structural subtyping.
However, that leaves an awkward question, how can we best handle the use cases that are certainly valid without them such that I can say we don’t need intersections? Have we really just wasted months of work? No.
The pitch: The order matters to us and to the behavior we want, so lets go with an ordered construct that's simpler and still does what we need.
type[*Ts]
andobject[*Ts]
First of all, my apologies to @mark-todd as when this was originally floated as an idea in a slightly different form by him in Discord, I stated the non-equivalence to intersections and that that should be separate, and we went back to working on intersections, but there’s a real question here I think we need to answer: Do these not solve every case we care about sufficiently? As originally stated, I would say that they didn't, but I believe I made a mistake in viewing these as something we should consider separately instead of as something which might do enough while being simpler. Exploring the idea further led to a slight tweak in which they appeared to me to solve all presented use cases that intersections could in the context of the current type system.
If it does, we can use the existing explorations to justify this construct instead and leave notes on rationale for anyone who wants to explore the difference between this and “true intersections” at a later date.
Background
The primary difference between all interpretations of Intersection which anyone involved has stated support for (For this I'm mostly looking at 4, 5, and the not fully fleshed out, hypothetical 6 from #31, #37) comes into play in things that are defined on multiple operands. Put another way, the unordered nature of Intersection as a parallel construct to Unions has some very unpleasant consequences. This became especially contentious when looking at what happens in non-disjoint cases and (implicitly non-disjoin) cases involving
Any
.I iterated on this several times trying to find a solution that satisfied all the complaints from multiple parties, but every time there was still some corner case that had to favor one use over another, and given that we wanted intersections to also take up an operator in the type system
&
the consequences for needing two different constructs to express one version or the other wasn’t sitting right with me. With some time spent reflecting on it, I realized my primary issue with favoring any solution over another was that by promoting one to use&
and another being relegated to requiring an import, we were necessarily putting some needs above others when there was no consensus on which behavior would be more common, or how this would reconcile withAny
.Enter, an alternative:
type[*Ts]
This is ordered. People want option 4 for type[Any & T]? Write type[Any, T]. Want option 5? Write type[T, Any]. This reflects existing behavior, including the ordered nature with subclassing of Any.
Semantics
type[*Ts]
does not create a new type at runtime. It creates a virtual type that has the bases*Ts
. The order is not semantically considered for nominal subtyping, but should be considered when handling diamond patterns. As a return type or annotation, the type returned must have the bases Ts. The bases do not need to be in the same order, but the types must resolve as if they were. `object[Ts]` is the equivalent formulation for an instance of something of that type, reflecting the duality between type and object at both runtime and in the type system.This allows the following class definitions (and many non-trivial others)
to each work for
type[A, B]
or instances of these classes forobject[A, B]
.This also allows all of the current assumptions of nominal subtyping to apply to this construct.
In the case of protocols, each must be satisfied.
Because of this, it is enough at assignment to check that each type in
Ts
fortype[*Ts]
is satisfied, as any real object being passed would be checked when defined for LSP. LSP does not need to be checked for at time of assignment. At time of use, lookup should be done in order. By deferring to the fact that LSP violations would be detected elsewhere, the ordered search should be fine.Examples:
Does this really cover all of our use cases?
It covers all of mine, including potential future cases with type negation as a separate construct. I've given it a lot of thought, and it seems to cover every case I've seen floated for intersections, both in this repo and elsewhere when discussing intersections for python, but the trick with
type[A, B]
allowing the mro order to be B, A is mandatory for that to be the caseThere are potential cases that I would be blind to here, but we’ve limited the scope of this to non-disjoint overlap of types with differing types involved in resolution, and this allows handling of even that scope to a very large degree.
There are potentially other cases that can come up with higher kinded types. I have not explored in depth what this would mean with the inclusion of higher kinded types in python, because there is no strong prevailing push for these to use as a framework to explore. The temptation to view HKTs as in one of the languages that has them exists, but there’s no guarantee that any of these would reflect what is added to python’s type system. I believe that this composes properly with any reasonable future interpretation of higher kinded types, but will refrain from saying that HKTs could not re-introduce the need for a fully featured intersection.
What about type safety vs pragmatism and warnings?
Any
did not precede non-Any types in the order, or even to preclude it all together in a check, but this would be an optional additional constraint provided by a specific tool, not something that needs to be forbidden by the type system (And which would have negative effects on gradual typing to forbid outright)What about ergonomics and obviousness of behavior?
In the most commonly anticpated case, of disjoint types, the ordering won’t matter. The definition above only cares about the order when mro would change a resolved type. The complex cases where the ordering will matter should be obvious, as it will be the same as the required mro order when defining the type.
What about backwards compatibility with
type[T]
(singular type)?Behavior is unchanged, the definition here is compatible with and does not change the behavior of the current use of
type[T]
What about not using type and object here?
It's probably better to use a new
typing.OrderedIntersection
instead ofobject
for this, and allow it to compose withtype
with the existing semantics.That would make one of the examples using decorators look like this:
Having our cake and eating it too
It may be less satisfying for some here to have done all this work and to walk away not implementing intersections, but if we use this work to implement something that handles as many cases as this would, and which leaves the edge cases in the user's hands to pick the behavior they want, I think we walk away accomplishing everything we set out to do by adding intersections, by not adding intersections. If this works for everyone here, the work wasn't a waste, but informative of what the issues would be with an unordered intersection, and an exploration of what the scope of difference would be with different solutions. This remains available as a reference to those revisiting and (if we go with this) we should absolutely document for history "why not a true intersection".