CarliJoy / intersection_examples

Python Typing Intersection examples
MIT License
31 stars 2 forks source link

Handling of `Any` within an Intersection #1

Closed CarliJoy closed 9 months ago

CarliJoy commented 1 year ago

There is a great deal of confusion about handling the Any type within an Intersection.

In Python, Any is both a top type (a supertype of all types), and a bottom type (a subtype of all types). Python has a gradual typing system, meaning that it will never be required that everything is typed. Everything that is not typed is considered Any.

We examine five ways to handle intersections with Any:

Remove Any from Intersections

Arguments in favour

Arguments against

An Intersection containing Any becomes Any

Arguments in favour

Arguments against

Disallow Any in Intersection

Arguments in favour

Arguments against

Treat T & Any as irreducible in general

Arguments in favour

Arguments against

Any is only considered in an intersection in deference to non-gradual types.

Arguments for

Arguments against


⚠️ Rules for contribution to this Issue

The general idea is that I will update the description, allowing the discussion to be included in the PEP and prevent a discussion going in circles.

I will react with 🚀 once I included something in the description.

mikeshardmind commented 1 year ago

Arguments in favor of Removing Any from Intersections

Any is essentially the universal set of types. The identity element of intersection on sets is the universal set for domains in which a universal set does exist. The Identity element for unions is the empty set (conceptually, typing.Never) The divergence in behavior is easily explainable as a consequence of logical ordering, and can be taught easily utilizing python builtins any and all and their corresponding behavior, as well as the corresponding set operations.

Edit: Argument retracted, Any's definitional mismatch between ? and Ω does matter here.

mikeshardmind commented 1 year ago

Arguments against An Intersection containing Any becomes Any

This reduces the effectiveness of gradual typing, as it causes a known constraint of the intersection to be thrown away, causing Any existence of Any to become contagion to code using intersections. This could further fragment the ecosystem between fully typed code and not.

mikeshardmind commented 1 year ago

Arguments in favour to Disallow Any in Intersection

It is possible that Any & T is inherently unsafe.

Consider the case Any & AsyncCallback where AsyncCallback is a protocol or abstract base class describing the expected interface a coroutine will be scheduled with. It is possible that a type that satisfies Any conflicts with the required interface of AsyncCallback. (This can also be seen as a reason to remove Any and only treat Any as the universal set, rather than disallow)

antonagestam commented 1 year ago

I think it's worth mentioning that TypeScript considers T & any = any, though I find this surprising. Perhaps there have been previous discussions there that can be learned from?

I think it's also worth mentioning that the description of Any in the original post here is incomplete. It is both a subtype and a supertype of all other types.

NeilGirdhar commented 1 year ago

Dear everyone.

I'd like to propose a fourth option, which is that T&Any collapses to Any in some situations, collapses to T in others, and is irreducible in general. I organized this post in a few sections:

I briefly discussed this with Jelle who suggested I post this to this thread. Please let me know if I've made any errors.

Background: Unions with Any

Before I get into the meat of it, here's an interesting discrepancy between PyRight and MyPy:

from typing import reveal_type, Any, TypeVar
def f(x: bool, t: str, a: Any):
    y = t if x else a
    reveal_type(y)  # Pyright: str | Any; MyPy: Any
    y.f()  # Pyright: Error; MyPy: Okay

As we can see, Pyright is not collapsing str | Any to Any. It treats it as irreducible.

I think MyPy is doing the wrong thing here. (I discussed this with Jelle who said it was due to "mypy's general behavior to use the join operator for ternary expressions. We'd like to fix that at some point but it's difficult for compatibility reasons.")

The reason this is incorrect is for the same reason it's an error to do this:

from typing import reveal_type, Any, TypeVar
def f(x: bool, t: str, a: int):
    y = t if x else a
    reveal_type(y)  # str | int
    y.strip()  # Cannot access member "strip" for type "int".

Thus, we can see that in general T | Any is not Any. Although, there are cases where it can be collapsed to Any.

Background: Definition of Any

It's also worth exploring what Any really means since that seems to be a source of confusion. Any is not:

That's because its supertypes Sup(Any) = Any whereas Sup(Omega) = ∩_i Sup(T_i) = Sup(object) = object. This is why we cannot reason from set theory.

Analysis

For any type T, consider its subtypes Sub(T) and supertypes Sup(T). For an intersection, A & B, we have:

We can now deduce the rules:

For ordinary types A and C<A, then we get:

However, this is not the case when C is Any! Then we have

Thus, in general, A&Any is an irreducible form. But as a parameter of a function (or similar subtype question), we have

class A: pass
class B: pass
def f(x: A&Any): ...
f(A())  # Okay, of course.  A is both a subtype of A and Any.
f(B())  # Error, of course.  

As a parameter to a function (or similar supertype question), we have

class A: pass
class B: pass
def f(b: B): ...
x: A & Any
f(x)  # No problem!  This is because intersection broadens the interface to everything.

Why does intersection broaden the way a variable can be used?

This seems paradoxical, but intersection broadens the way a variable can be used in exactly the same way that its dual, union, narrows the way a variable can be used.

Consider:

class A: ...
class B(A): pass

def g(a: A): pass
def h(b: B): pass

def f(x: bool):
    y = A() if x else B()
    reveal_type(y)  # A | B
    g(y)
    h(y)  # Argument of type A | B cannot be assigned to parameter of type B

Given a union, only those operations that can be done to every element of the union can be done to the union.

This is why any operation that can be done to any element of the intersection can be done to the whole intersection. We can test this in Pyright:

class A:
    def a(self): pass
class B:
    def b(self): pass

def f(x: A):
    assert isinstance(x, B)
    reveal_type(x)  # subclass of A and B; i.e., A & B
    x.a()  # Okay!
    x.b()  # Okay!

Thus, the intersection A&B can do anything that any of its components can do.

Consequences

In a function, the desired behavior of collapsing Any is correct:

def f(x: T & Any) -> None: ...
reveal_type(f)  # (T) -> None

(This is what everyone wants.)

If a variable ends up with type T & Any, for example as the output of a generic function, it should be usable everywhere:

def f[T](x: U) -> T & U: ...
x: Any
y = f(x)
reveal_type(y)  # T & Any

def g(s: str): ....
g(y)  # Always passes!

We need some kind of special case for isinstance(x, T) when x has type Any.

def f(x: T, y: Any):
  assert isinstance(x, U)
  reveal_type(x)  # Ideally, T & U; currently, PyRight can sometimes do this; MyPy gives U.
  assert isinstance(y, T)
  reveal_type(y)  # Must remain T;  Cannot be Any & T or type checking will be severely impacted.

I think the easiest thing to do would be for isinstance to intersect types unless one type is Any, in which case it replaces it. Some thought should go to this to ensure that no weird cases pop up. In the worst case, a new parameter to isinstance may be necessary to select intersection or replacement. This addresses the "contagion" effect mentioned above. It also means that T & Any will be extremely rare, which I think is what everyone wants as well :smile:.

Along the same lines, there are other ways to narrow types, including singledispatch, match statements, and function overloads. I think the logical thing to do is to allow Any & T to leak into them and out of them. If this becomes problematic, a user may be able to use isinstance to eliminate Any.

Danger of unconditionally collapsing T & Any to T

Consider:

class A:
    def f(self):
        pass

class B:
    pass

T = TypeVar('T')
def g(x: T) -> T & B:
    assert isinstance(x, B)
    return x

def j() -> A:  # Case 1: return type annotated A; Case 2: Any
    return A()
def k() -> Any:
    return A()

y = g(j())
reveal_type(y)  # A & B
z = g(k())
reveal_type(z)  # Any & B

y.f()  # Allowed.
z.f()  # Only works if Any & B is not collapsed to B.

If intersection collapses types automatically, then z.f() fails even though all that's changed between y and z is that the argument to g has been changed from a type to Any. It would be extremely weird for broadening a type to Any to create type errors! These errors can be very hard to track down since they can be totally unlocalized (one library can change a type to Any, and the error can pop up miles away in user code that doesn't even use that function directly). Thus, these errors would be extremely pernicious.

Conclusion

T & Any is irreducible in general. Its subtypes are the subtypes of T, and its supertypes are the universe of all types.

mikeshardmind commented 1 year ago

I entirely disagree with the above. Any is the universal set of python objects and object is not. Not only is this the actual definition of Any per pep 484, but object is clearly not the universal set of objects, as the type of types is type While having a problematic defined interface.

The current definition of Any itself creates contradictions in any attempt at a logically consistent type system

A special kind of type is Any. Every type is consistent with Any. It can be considered a type that has all values and all methods. Note that Any and builtin type object are completely different.

The attempt to treat Any as if it is a subtype or supertype is misguided at best and leads to inconsistent results.

While Any is implemented as a type in python, it isn't conceptually a type, but the set of all types when it comes to how it is treated in type checking and it's purpose in the type system.

Additionally, the claim that intersection should work like Union in the below is a complete misunderstanding of logical operators

class A: pass
class B: pass
def f(b: B): ...
x: A & Any
f(x)  # No problem!  This is because intersection broadens the interface to everything.

Intersection is "all" requirements, Union is "Any" requirements. Intersection should not be seen as broadening the allowable types, that is what Union does. Intersection is more restrictive on the types allowed the more types which are added to it.

Conflating the potential interface with the allowable types is problematic. In the Case of A & Any Things which are not A strictly are not part of the intersection.

The other problem comes in from the other part of the definition of Any per pep 484

A special kind of type is Any. Every type is consistent with Any. It can be considered a type that has all values and all methods. Note that Any and builtin type object are completely different.

This speaks to the potential interface of Any, not to the type of Any. Without changing the definition of Any, this part should make Any unsafe as an operand in an intersection, as by having all potential methods, it also has all potential conflicting method definitions with other operands of the intersection.

consider:


class A:

    def do_something(self):
        ...

class B:

    def do_something(self, x: int, /):
        ...

These are incompatible interfaces for an intersection and Any can be either.

NeilGirdhar commented 1 year ago

Any is the universal set of python objects

Where do you see in PEP 484 that Any is the universal set? Perhaps it would be best to clarify using mathematical language exactly what you mean by "universal set". It's not the union of all types Omega for the reason I described above. Do you mean something else?

and object is not.

I didn't say object was the universal set. I said that object is the universal base class. This is a technical term. I'll add a link to the wikipedia article.

Intersection should not be seen as broadening the allowable types,

Intersection broadens the allowable parameter types for which x can be passed for the exact same reason that union narrows it. I will add a section on a comparison to unions then to make all of the rules clearer and more obvious. Since unions are implemented today, you will be able to verify these rules yourself.

mikeshardmind commented 1 year ago

Intersection broadens the allowable parameter types for which x can be passed for the exact same reason that union narrows it. I will add a section on a comparison to unions then to make all of the rules clearer and more obvious. Since unions are implemented today, you will be able to verify these rules yourself.

This is entirely wrong, you are conflating types and their interfaces here, as was already explained to you in the main thread here: https://github.com/python/typing/issues/213#issuecomment-1646681413

I didn't say object was the universal set. I said that object is the universal base class. This is a technical term. I'll add a link to the wikipedia article.

Except that it isn't, you can see this with the type of object being type, and the type of object() being object, I'm quite aware of how the type system works and don't need you to link me a Wikipedia article on the matter.

Where do you see in PEP 484 that Any is the universal set? Perhaps it would be best to clarify using mathematical language exactly what you mean by "universal set". It's not the union of all types Omega for the reason I described above. Do you mean something else?

I've edited in more detail. The term universal set of objects is not used, but definitions consistent with it are. However there's another part of the definition which is problematic by both saying Any is consistent with all objects, and also saying Any has all interfaces. This is quite literally self-contradictory the moment we consider that interfaces can have multiple conflicting definitions.

NeilGirdhar commented 1 year ago

This is entirely wrong, you are conflating types and their interfaces here,

I think the fundamental disagreement here is that you are identifying a type by the set of subtypes only Sub(X)? The inferface (Sup(X)) is also important, and it's affected by union and intersection. This is why X | Any is irreducible in Pyright today.

Except that it isn't, you can see this with the type of object being type, and the type of object() being object,

I think it's fairly well accepted that object is the universal base class since isinstance(x, object) is true for all x, which is the definition.

The term universal set of objects is not used, but definitions consistent with it are.

Normally, in mathematics, the universal set is the union of all elements. Any is not union of all types Omega, so I assume you mean something different.

there's another part of the definition which is problematic by both saying Any is consistent with all objects, and also saying Any has all interfaces.

I don't think this is problematic. This is why I prefer to consider to type expressions X and Y to be equivalent only when Sup(X) == Sup(Y) and Sub(X) == Sub(Y).

I think at heart, the main point of contention is that are thinking about intersection only in parameter lists or similar subtype questions. In those situations, you are right that you can reduce T & Any to T. However, those are not the only situations in which intersections pop up. For example, see the consequences example I gave. Similarly, unions T | Any do reduce to Any in parameter lists and other subtyping questions, and similarly, they are irreducible in other situations (see the example in the background section).

So, I'm not contradicting you in those parameter list situations. However, there are other situations to consider, and it is possible for a revealed to type to meaningfully have type T & Any, which would be irreducible.

mikeshardmind commented 1 year ago

This is entirely wrong, you are conflating types and their interfaces here,

I added a section with an example so that you can see for yourself that intersection does indeed broaden what you can do with an object. Is it possible you misunderstood what I'm saying?

your edit adds "Why does intersection broaden the way a variable can be used?" this is the definition of an interface, not a type. You are still conflating the two, and the difference between types and their interfaces is actually why this is such a problem with the current definition of Any with intersections. Please review what other people have told you about this.

Unions are broadening of type, but only the shared interfaces may be safely used because we only know that it has at least one of these interfaces.

Intersections are narrowing of type, but because of the requirement of compatibility with all of an intersection's types, allow using the combined interface of the types.

In any interpretation of Any based on its current definition in accepted peps, Any presents a potential problem here.

Any having all potential interfaces makes Any means that all of the potential interfaces of Any are in conflict with the other operands as they can be defined as incompatible, it has all interfaces, compatible versions of them and not. For this reason, either the definition of Any needs changing or Any is unsafe as an intersection operand.

But even with this, the other type requirement of the intersection does not go away in the process.

The definition of Any is logically inconsistent and intersection and its logical consequences just shine a light on this pre-existing problem.

For what it is worth, mypy provides configuration options forbidding the use of Any specifically because of the problems of Any, and Ruff provides lints disallowing Any for the same reasons.

If it comes between implementing a typing feature in a way that is logically broken, and not allowing some code that already isn't typed to make use of a typing feature, the latter is preferable.

NeilGirdhar commented 1 year ago

this is the definition of an interface, not a type. Y

When we talk about types, they can occur in a variety of places. One of those places is that they can be associated with a variable as a "static type". These static types must encode all of the information that we have about the static type, including the interface (the way in which the variable can be used).

This discussion is about how type checkers should manipulate intersections with static types. Because the interface matters, for the static type of variables, you cannot collapse T&Any to T.

Ultimately, type checkers have to report the static types with reveal_type. And so the types returned by reveal-type will sometimes report T&Any, which is distinct from T.

Maybe for you a type checker's "static type" is not a "type". But this is one sense in which I am discussing types.

Another place types come up is in parameter lists. In that location, T&Any does collapse to T. But type checkers deal with types in a variety of places—not just parameter lists.

in conflict with the other operands as they can be defined as incompatible, and it has all interfaces, compatible versions of them and not.

That's not a conflict for the same reason that T | Any is not a "conflict". You can pass whatever you want to a function that accepts this type.

If it comes between implementing a typing feature in a way that is logically broken, and not allowing some code that already isn't typed to make use of a typing feature, the latter is preferable.

To be honest, I don't think anything is logically broken.

mikeshardmind commented 1 year ago

in conflict with the other operands as they can be defined as incompatible, and it has all interfaces, compatible versions of them and not.

That's not a conflict for the same reason that T | Any is not a "conflict". You can simply do whatever you want with such an object of this type.

It is a conflict in the intersection case though. T & Any requires the interfaces of both T and Any. Any has all interfaces, those compatible with T and not.

You can't do whatever you want with T | Any

from typing import Any
from datetime import datetime

def function(x: datetime | Any) -> None:
    x.f()
main.py:6: error: Item "datetime" of "datetime | Any" has no attribute "f"  [union-attr]
Found 1 error in 1 file (checked 1 source file)

This also isn't unique to parameters, as the type also matters when it comes to potential reassignment, it happens even with a direct typing of a variable:

from typing import Any
from datetime import datetime

def function(x: datetime | Any) -> None:
    x.f()

def other_function() -> None:
    x: datetime | Any = datetime.utcnow()
    reveal_type(x)
    x.f()
main.py:6: error: Item "datetime" of "datetime | Any" has no attribute "f"  [union-attr]
main.py:11: note: Revealed type is "Union[datetime.datetime, Any]"
main.py:12: error: Item "datetime" of "datetime | Any" has no attribute "f"  [union-attr]
Found 2 errors in 1 file (checked 1 source file)

You can only use the interface provided by T safely, as you could receive T here, rather than Any. While Any has a compatible interface, you still need to care about the interface of T.

The difference between types and their interfaces matters. you cannot handwave them away with less rigorous definitions, and this is highly important here because the behavior of Union and Intersection are diametrically opposed in their effects of broadening vs narrowing of both interfaces and types

To be honest, I don't think anything is logically broken.

The logical contradiction has already been presented, and you have not given an explanation which satisfactorily shows it isn't in contradiction.

NeilGirdhar commented 1 year ago

You can't do whatever you want with T | Any

Sorry, I meant the dual of "pass whatever you want". You can do whatever you want with T & Any the same way that you can pass whatever you want to a function accepting T | Any.

The difference between types and their interfaces matters.

I think we're going in circles again. The static type includes all of the information known about the type. You seem to be dividing this information into something you're calling the "type" and the "interface". I think your definitions roughly correspond to Sub(X) and Sup(X), but it's hard to say. Nevertheless, when I use the word "type", I mean the static type (as returned by reveal type). The static type must encode what you call the "interface"—it is not distinct.

The logical contradiction has already been presented, and you have not given an explanation which satisfactorily shows it isn't in contradiction.

Just what I said above: "You can do whatever you want with T & Any the same way that you can pass whatever you want for T | Any." There's no contradiction. Why don't you show some code that is ambiguous?

DiscordLiz commented 1 year ago

There's no contradiction. Why don't you show some code that is ambiguous?

I don't know why you are still arguing this @NeilGirdhar but type theory is pretty clear here and you keep asking for things you've already been provided by people being far too patient with you.

NeilGirdhar commented 1 year ago

I don't know why you are still arguing this @NeilGirdhar but type theory is pretty clear here and you keep asking for things you've already been provided by people being far too patient with you.

I only posted because Jelle looked my post over and said it looked right. Let's try to keep the discussion civil.

mikeshardmind commented 1 year ago

I don't know why you are still arguing this @NeilGirdhar but type theory is pretty clear here and you keep asking for things you've already been provided by people being far too patient with you.

My patience on this is my own, please don't presume that I should be more or less patient with people, thanks.

Why don't you show some code that is ambiguous?

Already have presented it, but I'm fine to explain one more time.

Any is defined as having all interfaces. Some interfaces are incompatible with others, as shown above, but here it is again.

class A:

    def do_something(self):
        ...

class B:

    def do_something(self, x: int, /):
        ...

A & B  # unsatisfyable condition

class AP(Protocol):

    def do_something(self):
        ...

class BP(Protocol):

    def do_something(self, x: int, /):
        ...

AP & BP  # unsatisfyable condition

An intersection must satisfy all of its requirements, conflicting interfaces create a situation where there are no types that could possibly satisfy the requested interface. Any is defined as having all interfaces. So while it may be safe to interact as if any has any version of that interface, it would be impossible to correctly say that Any has a non-conflicting interface with non-infinitely defined interfaces.

mikeshardmind commented 1 year ago

the "type" and the "interface". I think your definitions roughly correspond to Sub(X) and Sup(X), but it's hard to say.

This would be a far easier discussion to have from a shared mathematical understanding of type or category theory, rather than the non-rigorous definitions that python currently has adopted, but if python had a mathematically rigorous type system, we wouldn't need to have this discussion currently. The moment you attempt to formalize all of the existing rules into concrete logic, there are some types which can be created at runtime, but are inexpressible by the typing, as well as logical contradictions that go away if you ignore the existence of Any

Sub(X) and Sup(X)

These don't have definitions agreed upon anywhere outside of type theory, and your use of them here does not match the use in type theory. I've seen the doc floating around that proposes using this, but the use isn't consistent with existing type theory, so I've avoided potentially muddying this conversation with the use of one definition where the other might be assumed.

Interface means the obvious thing here, namely the known methods, attributes, etc of an instance of a type, and their signatures and/or types.

Type also means the obvious thing here. (The type which an instance of an object is)

Interfaces and types are largely identical outside of LSP violations when talking about only one type. When discussing unions and intersections of types, this changes.

Treating Any as a singular type with an infinite interface is the mathematical problem here. Python does not have the means to have multiple methods of the same identifier with differing arity/signature like some other languages, so the infinite interface contradicts other interfaces in the case of an intersection where any shared interfaces must be compatible

NeilGirdhar commented 1 year ago

An intersection must satisfy all of its requirements, conflicting interfaces create a situation where there are no types that could possibly satisfy the requested interface. Any is defined as having all interfaces. So while it may be safe to interact as if any has any version of that interface, it would be impossible to correctly say that Any has a non-conflicting interface with non-infinitely defined interfaces.

Oh I see. That's a great point.

Let's see what PyRight does for this:

class A:
    def f(self, x: int): pass
class B:
    def f(self, x: str): pass

def f(x: A):
    assert isinstance(x, B)
    reveal_type(x)  # subclass of A and B
    x.f(2)  # Okay!
    x.f('s')  # Fails; expects int.

So it seems that PyRight is not really intersecting, but rather creating a virtual subclass:

class X(A, B): ...

What are the possible solutions?

My preference is probably for case 3 as the most permissive. It would allow you to do something like this:

class A:
  def f(self): ...
  def g(self) -> str: pass

class B:
  def f(self): ...
  def g(self) -> int: pass  # g is different

def g(x):
  assert isinstance(x, A) and isinstance(x, B)  # No subclass like this exists yet, but we may yet create one.
  x.f()  # No problem
  x.g()  # Error!

class C(A, B): ...  # C.g is A.g
class D(B, A): ...  # D.g is B.g
g(C())
g(D())

Type also means the obvious thing here.

The way you've been using type doesn't match what is returned by reveal type in Python, which I think is the source of the confusion. I imagine, you would say that T | Any is not a type either? And yet it's returned by reveal type. So this is the way I've been using it.

Treating Any as a singular type with an infinite interface is the mathematical problem here. Python does not have the means to have multiple methods with differing arity like some other languages, so the infinite interface contradicts other interfaces in the case of an intersection where any shared interfaces must be compatible

Right, that's a fair point.

In cases 1 and 2 above, Any should be forbidden from intersection. In case 3, Sup(T & Any) = object. (This resolves your conflict by basically saying that you can't use the variable except as an object.)

My preference with Any though is to let it leak through so that the interface fo T & Any is Any for the reason described in my long post at the top under "Danger".

mikeshardmind commented 1 year ago

The way you've been using type doesn't match what is returned by reveal type in Python, which I think is the source of the confusion. I imagine, you would say that T | Any is not a type either? And yet it's returned by reveal type. So this is the way I've been using it.

Correct, I consider T | Any a set of types defined by the union of types which satisfy either T or Any, having the knowable interface shared between them.

I think the fact that python's terminology is so loose compared to more well-defined type theory creates more situations like this than I'd like.

Allow intersecting types, but only allow the intersection to have an interface of compatible methods.

I believe this should already be considered a requirement, but I disagree about it solving Any with Any's current definition.

I think it's possible to redefine Any and Never in terms that resolve this without needing to remove the usefulness of Any and without disrupting any current uses (because the problem only arises currently with the not yet implemented intersection, and other things that have not been accepted yet, such as Not[T])

A definition for Any which would work:

The set of all possible types. Type checkers should allow any operation on things which are only known to be a member Any, as this is the default state of objects. Type checkers may restrict the allowed interface or the known type as more information becomes known about the object through narrowing via instance checks, type checks, or intersections with other types.

And a corresponding definition of Never:

The empty set of types. Functions which always raise never return anything. Objects which have no valid types should be marked with this, and code which is statically typed and attempts to use objects with this type should be considered as an error as this code is either unreachable or there has been a logical error in the types given to the analysis tool

NeilGirdhar commented 1 year ago

I think the fact that python's terminology is so loose compared to more well-defined type theory creates more situations like this than I'd like.

Right, now that I understand how you're using terms, I can just use your vocabulary.

So my preference is for T & Any to have the interface of Any, and what you're calling the "type" of T. It should be reported as a static type as irreducible T & Any since we need to keep track of all of information.

A definition for Any which would work:

What are the consequences of this defintion for the examples above?

mikeshardmind commented 1 year ago

What are the consequences of this defintion for the examples above?

For the case of Any, I believe it leaves all existing code exactly as is due to the restriction on type-checker behavior included. Type checkers already coerce Any to specific types via isinstance, so saying that this is desirable behavior doesn't change anything either. Leaving it as may leaves room for type checkers to not be considered in error if they have an opinion about not narrowing something for either performance, edge case, or any other reasons not considered at this time. This language could be restricted more in the future if python's type system is ever considered to be near-complete, but currently I think may leaves a good amount of freedom, while still having a strong definition of what behavior is allowed here.

The behavior on intersection would result in the type Any & T being reducible to T as rather than being all compatible types, it's merely the set of all types. I don't have a good answer for the impact on the edge case you presented near the start with Any, but I believe that this case is likely to be rare, and only happen in cases where the required interface of T is what is cared about, as users of Any have no benefit for using intersection to say it conforms with other interfaces as Any is already treated that way.

It may be something that type checkers should warn is happening by default or include a configuration setting to inform users is happening

The behavior of Never with these definitions would be entirely as it currently is. The current use of Never is already consistent with the set-based definition.

NeilGirdhar commented 1 year ago

Incidentally, we should at least then change the language in the top post of this thread to concord with the language you've been using especially because these differ from the existing PEPs. Specifically:

mikeshardmind commented 1 year ago

Agreed. I didn't realize how far apart we were on definitions initially, despite that I was consciously avoiding certain terms I felt had some ambiguity, there were others which also had mixed meanings in play.

NeilGirdhar commented 1 year ago

Okay, productive and interesting discussion, I'm going to take a break, but looking forward to seeing the next draft of this proposal. Intersections solve so many type errors; it's a feature I'm really looking foward to.

mikeshardmind commented 1 year ago

most importantly, instead of asking "what type is produced by T & Any", we should ask what "type" and "interface" is produced by T & Any where T is a set of types and an interface, and Any has a set of types and an interface.

It may also help to provide some examples, so starting that.

Intersection A & B

Type: Intersection of A and B. If A and B are both concrete types and not protocols, this must be a subclass of both. If either is a protocol, the protocol must be satisfied. If A or B is already an intersection, the intersections are combined. Interface: The interface of A and B are both available and must not conflict.

(A & B) & C is equivalent to A & B & C

Union A | B

Type: Union of A and B. If A or B are already Unions, The unions are combined

Interface: The attributes, methods, and properties shared between A and B which have compatible types. If A and B use an identifier in conflicting ways, the accessor of the object cannot know which way it is used without checking.

(A | B) | C is equivalent to A | B C

(The below may seem contrived, but I think including it can assure the definitions we end up on remain consistent)

The order of intersections and unions matters

Unions of Intersections (A & B) | (B & C)

Type: a union of the intersection A & B and the intersection B & C Interface: The shared compatible interface of 1) the combined interfaces of A and of B which must not conflict with each other and 2) the combined interfaces of B and of C which must not conflict with each other

(A & B) | (B & C) is equivalent to (A | C) & B

Intersection of Unions (A | B) & (B | C)

Type: Intersection of the unions A | B and B | C Interface: The interface of both the unions A | B and B | C, where each of those unions has the shared minimum nonconflicting interface, and no permutation of union elements results in a conflicting interface for the intersection*

(A | B) & (B | C) is not equivalent to B & (A | C), but to (A & C) | (B & (A | C))

* A contrived case, to be sure, but...

class A(Protocol):
    def this(self: Self) -> Self:
        ...

class B(Protocol):
    def that(self: Self, other: T) -> T:
       ...

class C(Protocol):
    def this(self: Self) -> None:
        ...

Problematic = (A | B) & (B | C)

There are two possibilities here

  1. This reduces to: (A & B) | B | (B & C) due to the incompatibility between A & C
  2. This is considered unsatisfiable as a set of types

I believe 2 to be the correct interpretation. 1 results in a different interface than may be expected.

Why this matters when considering Any

This is a set of ordering and satisfiability constraints that works with the proposed update to the definitions of Any and Never. Should these orderings and constraints be considered undesirable, we may be back to the drawing board on handling Any

tomasr8 commented 1 year ago

Thanks for the writeup @mikeshardmind! FWIW, I'm also working on trying to figure out how to combine intersections with unions and I think we mostly agree except for the last case - on the surface it seems like having the usual laws like distributivity satisfied is desirable, but let's maybe discuss it over at discord, so we don't pollute this issue :D

NeilGirdhar commented 1 year ago

I was thinking about chaing the language of "type" and "interface". Let's call this the "literature type".

At the end of the day though, you're going to have to translate the "literature type/interface" into things like:

These three types need to have a canonical representation that represents the literature type and interface—we can call this representation the canonical type. I'm not sure whether it will ultimately be easier to stick with the Python language for this reason.

Either way, there will need to be a canonical way to specify T & Any. If T & Any has a different interface than whatever "literature type" is chosen for T, then it will need a different canonical type than the canonical type of T. I guess we'll have to see what meaning T & Any takes first, but we may end up the irreducible T & Any that I suggested as the canonical type.

mikeshardmind commented 1 year ago

I was thinking about chaing the language of "type" and "interface". Let's call this the "literature type".

I'll give some examples of why I don't know that more definitions are strictly helpful and that it's more about understanding that there are some things we should not consider a type, but a specification of what we know about possible/allowed types.

The moment many of these are thought of this way, the distinction between a concrete runtime type and the interface we can semantically and statically show to exist becomes automatic.

In many cases, we don't have a concrete exact type. This is true even without considering Unions or Intersections when it comes to protocols. An annotation of a protocol is not a type, it's an expectation of a required interface and is sometimes referred to as structural typing (which is distinct from typing based on composition or inheritance)

In many places, we are not specifying a type, but what we know about the possible types or interfaces an object might have.

A protocol as an annotation is a specification about a known interface but says nothing about the concrete type other than that instances of it which satisfy it must implement that interface.

This is a protocol taken from real library code. The intent is that a user could provide most choices for a numeric container, including those that I might not have knowledge of as the library author, such as a custom vector class. The library does abstract math and simplification of terms on objects for which these are well-defined operations. This includes symbolic simplifications and in the future higher-order operations on functions of functions.

class SupportsBasicArithmetic(Protocol):

    def __add__(self: Self, other: Self) -> Self:
        ...

    def __sub__(self: Self, other: Self) -> Self:
        ...

    def __mul__(self: Self, other: Self) -> Self:
        ...

    def __truediv__(self: Self, other: Self) -> Self:
        ...

    def __eq__(self: Self, other: Self) -> bool:
        ...

Code that accepts parameters annotated with this, has no knowledge of the type of the object they are accepting, only of a required portion of the interface.

Additionally, writing

x: int | Fraction = 1

The type of x at that point in time is int (or Literal[1], but the annotation states that this isn't the intent), but it is annotated to specify that it could also be assigned as a Fraction.

Saying that x has a type of int | Fraction there feels wrong to me in terms of terminology, that's only what types (plural) x is allowed to have.

As to specific things you wanted defined:

a type returned by reveal_type,

With this model of thinking about it, reveal_type is a request to the type checker for what it knows about the potential types of an object (and type(thing) is still the canonical way to get the actual type at runtime)

the type of a parameter annotation, and the type of a variable annotation.

The annotation is a specification of what must be true about the type of the object for code relying on this annotation to be sound from the perspective of static analysis.... a set of specifications about the allowed type(s) and interface(s) required.


Edit: Actually, we probably still need more definitions accepted in some regard, and many definitions could stand to be made more clear, more rigorous, and/or more consistent, I just don't think we need multiple definitions for type. We just need to consider where we don't have a type, but a description of potential types.

We should still have interface as a defined term. I believe it was defined well in the discussions about protocol and some of the discussions about variance, but I don't think there's a good canonically accepted definition of interface for python currently

NeilGirdhar commented 1 year ago

I just don't think we need multiple definitions for type. We just need to consider where we don't have a type, but a description of potential types.

I understand, but the problem is that I'm not the person that needs to be swayed by the PEP. You're going to communicate with all of the Python people who are used to one set of definition.

For these people, Any is the "canonical type"; so, above, when you claimed that Any was the "universal set", this is incorrect since the universal set for them is Omega.

Now, I've figured what you mean by type ("literature type"); so, I know that for you the literature type LiteratureAny is the universal set. We can speak the same langauge.

But ultimately, you'll need to write a PEP and that PEP is going to have to relate to the other typing PEPs. When it uses symbols to represent types, those symbols have to make sense in the world of symbols that we already have. They have to be intelligible to people who are using those symbols.

If, as I think you're suggesting, the interface to X & Any is going to be different than the interface to X, the irreducible proposal that I had is probably the best way to communicate that—even though it may not be the type theory that you're used to. After all, this is the same choice that Pyright made with irreducible X | Any

Edit: Actually, we probably still need more definitions accepted in some regard, and many definitions could stand to be made more clear, more rigorous, and/or more consistent, I just don't think we need multiple definitions for type. We just need to consider where we don't have a type, but a description of potential types.

Yes, that's fair. Nevertheless, you're going to have to turn that "description" into some canonical representation for display and annotation.

JelleZijlstra commented 1 year ago

Going back to the original post, my thinking is that T & Any should not be reduced to either T or Any, but is meaningful and should be supported without a need for a special case. This should be a fourth option in addition to the three already listed.

Consider this class:

class X:
    attr: Any & int

When setting the attribute, you need an object that satisfies both Any and int, so in this case it is equivalent to int. x.attr = 1 is valid, x.attr = "x" is not. But when getting the attribute, you get an object that satisfies both Any and int, which means you can do anything with it (e.g., len(x.attr) is valid).

This isn't by itself a strong use case, but as far as I can see it flows directly out of general rules about how Any and Intersection work, without a need for special cases. Note that instead I had written attrs: Any | int, I could have written a very similar paragraph above, just with the getter and setter reversed.

mikeshardmind commented 1 year ago

When setting the attribute, you need an object that satisfies both Any and int

I believe this is already precluded by the current definition of Any, as Any is defined to have all possible interfaces, including those incompatible with int. It would be legal to do x.bit_length("hex") (legal, not necessarily sensical) on something typed as Any, which is clearly incompatible with the interface of int

I think the strongest options here remain to modify the existing definitions to be more rigorous and compatible with both current and future goals (not create new terminology that exists parallel to it, and therefore doesn't fix existing things like Any) or exclude Any entirely for that contradiction.

I also think this is an incredibly vacuous use of an intersection, if you want len on something that's an int, you can require that via an intersection of int, and a protocol requiring __len__. (or whatever else you actually need) It's easy to write off Any here as "YAGNI" because people using Any already aren't having interfaces checked, so what would motivate this use? Why would someone write Any & int ? What situations in real code would cause this to happen that would not be accurately described by either Any, int, or int & SomeProtocol

The lack of an easy parallel here with Any compared to Union is the definition of Any including all possible interfaces. Union expands the potential types and only cares about the overlap in interfaces, whereas Intersection narrows what types are valid and imposes that any shared interfaces must be compatible.

Fixing that definition allows there to not need to be special cases.

DiscordLiz commented 1 year ago

First of all, I'd like to apologize for earlier. I interjected with frustration at the arguments that seemed to not even know basic theory which I would consider fundamental to even work on this sort of problem, when the reality appears to be a mixing of formal terms and commonly accepted terms which do not have formal definitions.

The problem here is that Any exists at all, but that won't go away due to gradual typing. I agree with @mikeshardmind that you could modify the definition could be changed to remove any logical contradiction, though I disagree that doing so would be easy in any way. His proposed redefinition to actually make it the universal set, which many people think of it as currently skips a small detail that needs to be included, and introduces a new ambiguity.

It would need to be defined as "A construct indicating the potential for any possible type, yielding to other more specific types whenever necessary" rather than "The set of all possible types", as otherwise, you still have Any as incompatible with concrete types in an intersection.

This definition leaves a lot as ambiguous, in particular "whenever necessary" is not definable without first knowing all current and future type constructs, but it is correct and allows eliminating it from places where it would be redundant or problematic.

You can't have a universal set that is actually a universal set that fully satisfies every requirement of python's type system and add an intersection type (or any other operation on sets of types which is composed using negative logical operations, there are positive set theories for which a universal set holds)

His new definition for Never works flawlessly from a formal standpoint, but I don't know enough about how Never is used in real python code to speak to accuracy of it continuing to work as-is for existing code.

The best case scenario here in terms of correctness would appear to be to error or warn on an intersection containing Any, but to simultaneously to this make a best effort attempt to determine possible interfaces and continue to type check uses of the types relying on the intersection using a charitably modified definition of Any.

It isn't a clean solution, but it is one that works well with the idea of gradual typing, which is the only reason Any should exist at all. Fully typed code bases accepting any object should use object | type[object] and become more specific as they can.

NeilGirdhar commented 1 year ago

whereas Intersection narrows what types are valid and imposes that any shared interfaces must be compatible.

This is an opinion, yes? Another option would be to allow the intersection, but define it to expose the total compatible interface. You have to consider that people have slightly different definitions of common methods. Someone may for example have a method like

    def __exit__(self ,type, value, traceback):

and someone else may have

    def __exit__(self ,type, value, traceback, /):

and then you're going to block intersection because they're not compatible?

mikeshardmind commented 1 year ago

This is an opinion, yes? Another option would be to allow the intersection, but define it to expose the shared compatible interface.

Even if we went that route, Any is incompatible with every portion of every interface in an intersection under the current definition of Any being compatible with all uses and having all interfaces. The nature of it being attempted to be defined in an infinitely compatible way is paradoxically making it infinitely incompatible in this case. It is defined in such a way that it has an interface for any possible identifier and multiple definitions of that interface that are valid.

It presents a logical contradiction here with Intersection (and not prior) because Intersection is the first typing feature operating on groups of types to be composed using a logical negation. (This actually isn't strictly accurate, but the other existing cases were already disallowed 1. directly subclassing Any, 2. subclassing incompatibly with the parent class)

All that said, I don't think we need to disallow Any (even if this would be my personal preference), but I think to not disallow Any, we should include updating the definition of Any to one which remains compatible with current use and which fixes this contradiction. Definitions being logically consistent can reduce confusion, and make sure people are all on the same page as more advanced features are added to the type system.

It would need to be defined as "A construct indicating the potential for any possible type, yielding to other more specific types whenever necessary" rather than "The set of all possible types", as otherwise, you still have Any as incompatible with concrete types in an intersection.

This is correct from a mathematically pure perspective, but python has natural limitations on what can compose a type, so we could have a universal set for the domain of possible python types.... and I already prefer your modified definition compared to attempting to prove that universal on that domain is logically consistent, and not even being sure that it would be a successful attempt.

NeilGirdhar commented 1 year ago

Any is incompatible with every portion of every interface in an intersection under the current definition of Any being compatible with all uses and having all interfaces.

If you go your route of balking at the logical contradiction then Any & Any is not even going to be Any. You don't think that's even more paradoxical?

And by your logic that a type (in the Python/canonical sense) with multiple compatible interfaces should not have any of those interfaces be usable. And yet that's exactly what Any does. So, if we accept your logic, then there's no Any in the first place.

Any exists because it is impractical for everything to be typed for a variety of reasons including limitations of the type system, developer time, use as generic parameters.

I think the best thing to do is to come up with examples that produce intersections with Any and to come up with practical solutions for those cases. If you look at the example in my big post, it does not make sense to me that converting a type to Any would create type errors, but that's exactly what it will do.

I think a big part of the problem is that you're identifying Python's Any with the universal set of types. Python's Any isn't the universal set. The Any that you want to define is. Python's Any has a definition in typing PEPs and is extremely commonly used. I'm not sure I totally understand the consequences (to actual code) of the change of definition that you're proposing. Do you want to elaborate? When you say "yielding to more specific types" as in types that have already been defined?

My feeling is that instead we should consider consistency with Any as a mechanism to selectively disable type checking where type checking is impractical.

mikeshardmind commented 1 year ago

(answering parts of this out of order)

I think we should consider consistency with Any as a mechanism to selectively disable type checking where type checking is impractical.

https://github.com/python/typing/issues/213#issuecomment-1646928419

This is more likely to break people through Any being contagion than intentional use. If you want to disable type checking, you can write Any instead of Any & T, but if instead Any always marked for disabling type checking, other code which interacts with types you export would find Any leaking out in this manner as destructive on their types.

Python's Any has a definition in typing PEPs and is extremely commonly used. I'm not sure I totally understand the consequences (to actual code) of the change of definition that you're proposing. Do you want to elaborate?

If (and I understand that it's a big if to change this definition given how established it is) we were to change the definition, I would only be willing to argue it if the definition change is done in such a way that the actual impact is only on logical consistency and done in a way to not change any existing established behavior of Any.

If you go your route of balking at the logical contradiction then Any & Any is not even going to be Any. You don't think that's even more paradoxical?

I think that's the correct logical conclusion currently and that that is paradoxical. This is why I'm in favor of updating the definition.

Any's current definition leaves a lot to be desired, and I want to see python able to benefit from established theory. Any's current definition makes that challenging without special casing it.

Any exists because it is impractical for everything to be typed for a variety of reasons including limitations of the type system, developer time, use as generic parameters.

If I could go back in time and argue only one change to the original specification, it would be not to have Any and to instead have a # type: untyped directive, to just allow things that are not well-typed to be marked as not for inference, and to not interact with typing directly. With that said, I understand the need for Any, and am not going to attempt to argue for it's removal, that's a non-goal of mine.

NeilGirdhar commented 1 year ago

This is more likely to break people through Any being contagion than intentional use.

Right, we need to be practical. That's why it's important to look at examples, I think. It's hard enough to create an intersection with Any, so I wonder how often this will happen in the first place. But when it does happen, it can create weird type checking errors that appear when you introduce Any. So there's a balance.

it would be not to have Any and to instead have a # type: untyped directive

The problem with that i that there are generic parameters to functions or classes, for example, that can be Any or a concrete type, so the typed directive would essentially be conditional. And you can in general have a stretch of code that is written agnostically to surrounding code that becomes typed or untyped based on the use of Any in the surrounding code.

DiscordLiz commented 1 year ago

When you say "yielding to more specific types" as in types that have already been defined?

When I included this in the updated definition I provided, my intent there is if Any is in conflict with a specific type, the specific type wins.

This is in line with how isinstance already works

foo: Any
if isinstance(foo, T):
    reveal_type(foo)  # T

It would work okay with Any | T because there is no conflict here. It would make Any & T become T because Any provides no specific information about a type, and the intersection is imposing compatibility between types. I don't think there's any other current case of conflict.

NeilGirdhar commented 1 year ago

@CarliJoy Can you add the proposal that Jelle made to treat T & Any as irreducible in general? For arguments in favor:

CarliJoy commented 1 year ago

Did this but as written there already - have no idea how that should work in practice. Could you paste some examples or even better an algorithm how a type checker should handle intersection with Any?


a: int & Any
a = 1   # valid
a = "Help" # invalid
a.foo() # valid

def foo() -> Any&int:
    return "a"  # invalid

In short: So for assignments the Any is ignored but for subtypes the it is considered?

NeilGirdhar commented 1 year ago

In short: So for assignments the Any is ignored but for subtypes the it is considered?

All of your examples look right to me. Are there any other examples that we need to reason about?

Did you notice that PyRight currently (and rightly) treats X | Any as irreducible? See above for examples of that.

NeilGirdhar commented 1 year ago

Could we refine some of the arguments for removing Any?

In particular the statement that "Any is essentially the universal set of types" is not true for Python's definition of Any. As we discussed above, the types we're talking about are not the "type theory" types. They define interfaces too, and Any's interface is not the same as the "universal set of types". This could be clarified: "With respect to subclasses, Any is the universal set of types."

Second, there's a claim that Union[T, Any] is Any, but as I showed in my long comment, this is not true in all contexts. It is true that they have the same subclasses. So, perhaps amend this statement to "T | Any has the same subclasses as Any"

It's no surprise that since these statement apply to subtypes only that T & Any reduces to T for subtypes—no one disagrees with that. So, perhaps it would be best to find some new arguments?

ippeiukai commented 1 year ago

I think it's worth mentioning that TypeScript considers T & any = any, though I find this surprising. Perhaps there have been previous discussions there that can be learned from?

Might be interesting to analyse TypeScript types in a set theory way:

https://ivov.dev/notes/typescript-and-set-theory

Oddly, any is a mixture of never and unknown. any is assignable to every type, just like never, and every type is assignable to any, just like unknown. As a blend of two opposites, any has no equivalent in set theory, and is best seen as an escape hatch to disable TypeScript's assignability rules.

https://blog.thoughtspile.tech/2023/01/23/typescript-sets/

any is a paradox — every set is a subset of any, but any might be empty. The only good news I have is that any extends unknown, so unknown is still the universe, and any does not allow "alien" values.

I wonder what's Python's equivalent for TS's unknown, and how Python's Any differs from TS's any.

DiscordLiz commented 1 year ago

I wonder what's Python's equivalent for TS's unknown, and how Python's Any differs from TS's any.

Python's Any is much closer to being equivalent to TS's unknown than TS's Any.

DiscordLiz commented 1 year ago

Could we refine some of the arguments for removing Any

I think the best argument is that the definitions in accepted PEPs already say we should, and that those maintaining type checking implementations have expressed that we stick to simple composable rules. Deviating from PEP483 requires a special case and a strong reason why the obvious rules from existing theory are insufficient, rather than simple composable rules.

erictraut commented 1 year ago

Several thoughts...

This thread starts with the assertion that Any is a "top type". That's kind of misleading because it also acts as a "bottom type". That's what makes its behavior difficult to characterize using standard subtyping rules. When Any is involved, we sometimes need to define special-case rules to disambiguate the intended behavior. In Python, the true "top type" is object and the true "bottom type" is Never (which also goes by the name NoReturn).

I think there are two questions that we need to consider when Any is used within an intersection: 1) the meaning of T & Any and 2) whether or not T & Any is reducible.

The meaning of T & Any IMO, the correct meaning should fall out of the generalized rules for intersections. In this comment I propose rules for subtype and attribute type evaluation. If you agree with these rules, then the meaning of T & Any falls out of those definitions:

Note that both of these rules assume no reducibility. If T & Any always reduces to Any, then:

Reducibility of T & Any TypeScript reduces both T | any and T & any to any.

PEP 483 & 484 never discussed the reducibility for unions (T | Any). Type checkers were able to decide whether or not to reduce this type to Any. Both mypy and pyright have chosen not to reduce in the case of unions.

class A:
    a: int

class B:
    a: Any

def func(x: A | B):
    reveal_type(x.a) # Both mypy and pyright reveal `int | Any`

As I've explained in the past, there is value in not reducing T | Any. If we follow that precedent, I don't think T & Any should be reduced either. (PEP 483 alluded to the fact that T & Any should be reduced, but I don't feel beholden to PEP 483 in this respect because Intersection was not formalized in that PEP or introduced into the type system at that time. PEP 483 simply indicates that Intersection might be added in the future and explains that its rules should mirror those of unions.)

If there is not consensus on this point and the Intersection PEP formalizes that T & Any should be reduced to Any, then I think the PEP should also formalize the same rule for T | Any reducing to Any (for consistency), and mypy and pyright should change their current behavior for unions accordingly.

@ippeiukai, thanks for pointing to the TypeScript documentation. It's useful to understand how other type systems have solved problems. That doesn't mean we need to adopt the same solution for Python, but it's useful information to consider.

@DiscordLiz, I think Python's Any is closer to TS's any than TS's unknown. The TS unknown type is like a union of all possible types. It's similar to the object type in Python. (There is a type called object in Javascript, but it's not a base type for all other types. That's why unknown needed to be added to the TS type system.) TS's any is outside of the subtype hierarchy — it's both a bottom and top type like Python's Any type. The distinction between object and Any is important when it comes to invariant contexts. A list[object] is different from list[Any]. The latter is exempt from the normal invariance requirements.

ippeiukai commented 1 year ago

The TS unknown type is like a union of all possible types. It's similar to the object type in Python. (There is an object type in Javascript, but it's not a base type for all other types. That's why unknown was required.)

@erictraut Thank you for pointing that out. Didn't know Python's object is like TS's unknown. Good to know.

mikeshardmind commented 1 year ago

TypeScript reduces both T | any and T & any to any

While I understand there is a rationale that allows this as any is both a top and bottom type, I believe that it is unnecessarily paradoxical to reduce T & Any to Any, and that it throws away a known constraint to do so.

Generally speaking, Unions (And all real-world motivations which have been provided for Intersections in python) are used for their use in defining compatibility. If someone defines compatibility by saying "we accept any type", it's not controversial to for the reduction to be more type compatible. If we define compatibility for an interface as "must satisfy a set of constraints", it would be surprising for one or more of the constraints to be discarded simply because Any leaked in from somewhere as one of the constraints.

I think it is fine to view Any as equivalent to an object in this specific context (and in the case of intersections as TypeVars, base it on variance), but it is not and cannot be broadly equivalent for reasons you've already covered in detail. To me, it's all about the direction in which the constraint flows by treating it as top or bottom. This is closely tied to Any's behavior and variance in my mental model as to why this reduction does not make sense.

Note that both of these rules assume no reducibility. If T & Any always reduces to Any, then:

X is always a subtype (and supertype) of T & Any. Accessing attribute foo from T & Any always evaluates to Any.

The lack of reducibility while treating the entirety of possible attributes as Any carries a similar issue in that the interface is broader than we know it must be. An object which satisfies Any & T must be an instance of T for a concrete type T, or implement the provided protocol in the case of a Protocol, and may have other unknown interfaces which we have to assume exist because of Any. But for this to be compatible, if T provides a definition for foo, (T & Any).foo should not evaluate to Any, unless that's the provided definition from T or we are still throwing away information.

For these specific concerns, I'm pretty negative on taking a direct lead from TS on this one.

I don't think this requires modifying Union to be consistent

The actual parallel to The automatic removal of Any by treating it as only a top type in this context is the removal of Never from Unions, not the expansion of Any In them.

This is because if we view Any as a top type and only a top type in this context, then it is the identity element for intersections of sets of types for the domain of valid Python types. We can remove a top type from an intersection of types because of it being an effective identity element on the set of possible types. Neither adding nor removing such a type from an intersection of types should ever change the set of types the set contains. (So this logic only holds if this is the rationale for this reduction which we go with, but I do contend that in this context, it is a reasonable interpretation, even if any is defined as both)

Similarly, Never is the identity element for unions of types.

NeilGirdhar commented 1 year ago

the obvious rules from existing theory are insufficient, rather than simple composable rules.

At the start of this thread there was a misapprehension that Any was the universal set, but this was based on a mistaken understanding of Python's definition of Any. And based on that mistake, there were some unjustified conclusions drawn based on theory.

those maintaining type checking implementations have expressed that we stick to simple composable rules.

Both Jelle and Eric have argued for irreducibility.

mikeshardmind commented 1 year ago

the obvious rules from existing theory are insufficient, rather than simple composable rules.

At the start of this thread there was a misapprehension that Any was the universal set, but this was based on a mistaken understanding of Python's definition of Any. And based on that mistake, there were some unjustified conclusions drawn based on theory.

For the record, as got discussed briefly in a discord call while I was working on an adjacent paper to come up with more formal definitions which are accurate to current use, we were able to show that it holds for a top type as well as the universal set. by reaching for category theory for answers rather than set theory applied to types.

If We can show that Any is functioning as a top type for python's type system in the context of an intersection, it is still logically removable from that intersection by being a functional identity element on the operation of intersection over categories of types, for the domain of valid python types.

those maintaining type checking implementations have expressed that we stick to simple composable rules.

Both Jelle and Eric have argued for irreducibility.

I'd like to discuss this a little further on what irreducibility would actually result in, as there are clearly some sharp edges to it.

As stated above, Any & T -> T comes from the same simplification as T | Never -> T, not to Any | T -> Any (removal of identity element)

Any | T -> Any (not being reduced) has been made as a comparison, but it is different logic to argue for that being reduced. This may also explain why this wasn't in the PEP when the intersection example was, without the existence of intersections.

There may be other reasons to still have it irreducible, but I'd like to ensure that the people making that point have a moment to consider the difference here.