Interpretation of type `Any & T`

CarliJoy / intersection_examples

Python Typing Intersection examples

MIT License

33 stars 2 forks source link

Interpretation of type `Any & T` #31

Closed mark-todd closed 10 months ago

mark-todd commented 11 months ago

Following on from the lengthy discussion here https://github.com/CarliJoy/intersection_examples/issues/1, it was concluded that Any & T is the best resultant type for intersecting Any. This is symmetric with the way union works Any | T -> Any | T. Conversation on this new thread will investigate the types of the attributes and methods of this intersection type. Just a note, inspired by @CarliJoy we can number and summarise different interpretations. We should also seek to again create a real world example that explores the new types, and compares different interpretations.

Also if a summary of each new interpretation could be succinctly provided alongside a usage example that would be grand :)

CarliJoy commented 11 months ago

Assume this simple but verbose example. I will handle only accessing properties at the moment.

Module a_untyped

class A:
    common: str
    a: int
    a_str: str

    def fn_a(self, a: int) -> int:
        return a

    def fn_mix(self, mix: int) -> int:

    ab_mix: "A"

Module b_typed

class B:
    common: str
    b: str

    def fn_b(self, b: str) -> str:
        return b

    def fn_mix(self, mix: str) -> str:
        return mix

    ab_mix: "B"

`main.py`

import typing as t
from a_untyped import A
from b_typed import B

def needs_int(x: int) -> None:
    """Example function that needs to be called with integer to work"""

def needs_str(x: str) -> None:
    """Example function that needs to be called with string to work"""

def can_handle_any(x: t.Any) -> None:
    """Example function that is able to handle any type"""

def use_ab_intersected(x: A & B) -> None:
    # All this examples should work, as they work in runtime
    needs_str(x.common)

    needs_int(x.a)
    needs_int(x.fn_a(1))
    needs_int(x.fn_mix(1))  # should be detected as the correct call
    needs_int(len(x.a_str.split()[0]))
    can_handle_any(x.a)

    needs_str(x.b)
    needs_str(x.fn_b("b"))
    needs_str(x.fn_mix("b"))
    can_handle_any(x.b)

    use_ab_intersected(x.ab_mix)  # Intersect the property is valid
    needs_int(x.ab_mix.b)
    can_handle_any(x.ab_mix.a)

    # This should raise an error detected by the type checker (if both types known)
    invalid1 = x.not_existing
    invalid2 = x.fn_mix(object)

In general even so a variable is Any we can assume that runtime implementation of the object hold by this variable has a concrete implementation.

But if package is not marked as typed, everything is imported as Any, so let's assume A is typed as Any.

As an user I would want the following behaviour. This is probably not possible, but we should check rules against this example.

def use_ab_intersected(x: A & B) -> None:
    # All this examples should work, as they work in runtime
    needs_str(x.common)

    needs_int(x.a) # type: ignore - required
    needs_int(x.fn_a(1))  # type: ignore - required
    needs_int(x.fn_mix(1)) # type: ignore - required
    needs_int(len(t.cast(str, x.a_str).split()[0]))  # this needs a cast
    can_handle_any(x.a) # should be fine, because it is `Any` already

    needs_str(x.b)  # Hard requirement, this *must* work
    needs_str(x.fn_b("b"))  # Hard requirement, this *must* work
    needs_str(x.fn_mix("b"))  # Hard requirement, this *must* work
    can_handle_any(x.b) # Hard requirement, this *must* work

    use_ab_intersected(x.ab_mix)  # Hard requirement, this *must* work
    needs_int(x.ab_mix.b) # Hard requirement, this *must* work
    can_handle_any(x.ab_mix.a) # Hard requirement, this *must* work

    # This should raise an error detected by the type checker (if both types known)
    invalid1 = x.not_existing  # type checker should complain
    invalid2 = x.fn_mix(object) # type checker should complain

CarliJoy commented 11 months ago

Hmm looking at the example it seems that we have to decided between allowing can_handle_any(x.a) and raising type errors on invalid accesses.

As for the most A related call we have to # type: ignore or cast things anyway, for accessing properties I would tend for the T & Any -> T.

This is in contract to the example given by @ippeiukai here.

Copied and modified below for easier reference:

By the way, in terms of gradual typing, I feel Intersection[T, Any] = T is awful.

Module 1

some_fn(some_obj)

Module 2

def wrap_obj(val: T) -> T & SomeAddedTrait:
  …

a = wrap_obj(some_obj)
some_fn(a)

Here, let’s assume some_fn and some_obj are provided by two different libraries both currently untyped. What if some_fn got typed before some_obj gets typed? I expect I can do anything to a that I can do to some_obj. I don’t expect a to be reduced to mere SomeAddedTrait.

TODO: formulate example more concrete, I don't get it atm,@ippeiukai could you formulate it with proper files and imports like I did above

mikeshardmind commented 11 months ago

By the way, in terms of gradual typing, I feel Intersection[T, Any] = T is awful.

I'll type up some examples when I have more time, hopefully later today, but both of the following are true here:

In terms of gradual typing, T & Any = T is awful
In terms of gradual typing T & Any = Any or otherwise unchecked (one interpretation of Any & T) is awful

I think @DiscordLiz had a good point in the prior thread about Any & T never being an intentional construct. When is this going to come up? Do people intending on having Any have a use case for intersections, or is this sufficiently advanced that people should be expected to have typed their bases better and we should err on the side of safety for users on this? I think we need to start from where we expect this to occur before we can evaluate the impact of any choice. It's clear it can only happen at the boundary between typed and either untyped or gradually typed things, but can we get more specific than that and determine if there's ever an intentional use for T & Any which isn't better served by other existing things?

mark-todd commented 11 months ago

@CarliJoy Thanks for this example! It's very thorough - I've reduced it down a bit to consider a particular subcase, not because the others aren't relevant, but just because I think it might be easier to discuss in pieces:

import typing
from intersection_examples import Intersection
class A:
    common: str

# A effectively gets cast to Any, so nobody knows the type common even exists

class B:
    common: int

def needs_int(x: int) -> None:
    """Example function that needs to be called with integer to work"""

def needs_str(x: str) -> None:
    """Example function that needs to be called with string to work"""

def use_ab_intersected(x: Intersection[A, B]) -> None:
    # Note: A&B is an LSP violation, but we don't know that it is, because
    # we don't know the type of A
    needs_int(x.common) # should be fine - this is a property of B
    needs_str(x.common) # This should also be fine - if A is used at runtime
    # It could be string? Or could it?

    # Follow up idea - do we have to assume the intersection created with Any
    # has not created any LSP violations?

My conclusion from this is that similarly to other types we intersect, perhaps we have to assume LSP has not been violated by the intersection with the Any type - although we have no way to be sure

gvanrossum commented 11 months ago

I am still groping my way around this subject. I am unsure of the technical meaning of "gradual typing" that seems to be used by some participants.

In the meantime, I have an example, which has actually convinced me that this proposal is fine:

def f(a: Any & int):
    x = a + 1
    reveal_type(x)

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

DiscordLiz commented 11 months ago

My conclusion from this is that similarly to other types we intersect, perhaps we have to assume LSP has not been violated by the intersection with the Any type - although we have no way to be sure

Which makes it intersections all the way down and no different than being Any, cause we have to assume LSP wasn't violated.

If that were intentional, someone could skip the intersection and just write Any, and that works today.

I think @DiscordLiz had a good point in the prior thread about Any & T never being an intentional construct. When is this going to come up? Do people intending on having Any have a use case for intersections, or is this sufficiently advanced that people should be expected to have typed their bases better and we should err on the side of safety for users on this?

I don't think it's intentional, but I do think it can happen. The middle ground (that was option 5? and we're only considering options 4 and 5 now? Sorry, I seem to have missed the reason for splitting this to a new thread.) seems to avoid the accidental issues possible in 4, but I'd like more examples of it from @mikeshardmind

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has add(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

That would be the behavior in option 5. Option 4, the return type can only be int & Any, it's recursive with that option, and more pervasive than I think is beneficial.

NeilGirdhar commented 11 months ago

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

Just something for you to consider. Let's just look at an example that doesn't involve Any:

class C:
  def f(self) -> None: pass
class D:
  def g(self) -> None: pass
class E(C, D): pass

def f(x: C):
  if isinstance(x, D):
    reveal_type(x)  # C & D
    x.f()  # should be okay since x inherits from C
    x.g()  # should be okay since x inherits from D

f(E())

This is why the interface of C&D is the union of the interfaces of C and D.

Now, consider:

what is the interface of Any?
what is the interface of Any & C?
So if x has type Any & C, what is the type of x.f(), x.g(), and x.h()?

It's really worth reading Eric's elucidating comment.

For reference, there are two proposals being considered here. I'll call them 4 and 5 (relating to names from #1). Proposal 4 says: x.f(), x.g(), x.h() are all Any. Proposal 5 says: x.g(), x.h() are both Any, but x.f() is None.

NeilGirdhar commented 11 months ago

Even though I'm not yet in favor of proposal 5, I have a way to illustrate the problem it may be trying to solve:

Consider what isinstance does:

x: Any
assert isinstance(x, U)
y: T
assert isinstance(y, U)

The general case illustrated by y gives y type T & U. But there's a special case: if we start with something of type Any, then we don't want to end up with Any & U, and we instead set x's type to U. The reason we don't want Any & U is because that would cripple type checking: You would not be able to check most things related to the interface of x.

Essentially, there are two isinstance operators. One acts is an intersection that acts on ordinary types and type variables, and the other is a type replacement that acts replaces an Any type with whatever type is provided. Currently, type checkers do a lot of type replacement, and only a rudimentary intersection in some special cases.

I think the people in camp 5 are worried that this pernicious behavior that we avoid using the type replacement will creep in when we enable intersections generally.

I considered a 6th proposal: encourage to choose replacement if they want to protect against the spread of Any. This should only be relevant in some special cases involving things like unbounded generics. This can be done like this:

def f[T](x: T):
     assert isinstance(x, U)
     # Normally, this would have type T & U, but you can force replacement by casting.
     x_as_u = cast(x, U)
     # The over-broad interface of T & U is avoided.

This would allow us to leave the ordinary meaning of T & Any alone without any special cases, but potentially eliminate the problems associated with it.

I think we need to see some concrete examples to test if this could work.

mikeshardmind commented 11 months ago

If I need a cast everywhere I use an intersection to avoid issues with Any, then I don't need an intersection, I can cast to a protocol that matches my use.

You're just replacing one wart with another worse one that disables other checks here, and requires runtime checkable protocols and a runtime cast use.

The reason we don't want Any & U is because that would cripple type checking: You would not be able to check most things related to the interface of x.

That's exactly the issue with option 4. It cripples type checking for the person with fully typed info on their side of the intersection and is no better for the person with Any on their yet untyped object than not using an intersection and using Any.

NeilGirdhar commented 11 months ago

That's exactly the issue with option 4. It cripples type checking for the person with fully typed info

So we're on the same page, and I think I understand your point of view.

I see how the deference proposal prevents false negatives, but it does so at the cost of false positives. Historically, the Python typing community has preferred solutions that avoid false positives even if they cause false negatives. For example, the adoption of the sequence protocol into the typeshed causes false negatives in order to avoid false positives.

I think we just need to see at least a few common examples to get a better handle of what the tradeoffs are.

If I need a cast everywhere I use an intersection

I don't think it would be "everywhere you use an intersection", but as I stated in the previous comment, it would only be in the cases of intersections with unbound type variables, and similar. Like I said, this was just something I thought of. We need to see some examples to know what it would look like.

mark-todd commented 11 months ago

I am still groping my way around this subject. I am unsure of the technical meaning of "gradual typing" that seems to be used by some participants.

In the meantime, I have an example, which has actually convinced me that this proposal is fine:
def f(a: Any & int):
    x = a + 1
    reveal_type(x)
Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

@gvanrossum I think your example makes sense, but the controversial part comes from the other side of the intersection - as you describe the result must have all the properties of int so using it with __add__ makes sense here. But on the other side we have Any, and therefore x must also have all the properties of Any. What does it mean to have all the properties of Any? If it encompasses everything, can we say it says any property we like? What if Any also has a method __add__ that receives str?

My conclusion from this is that perhaps it has to be some subset of every available option, e.g. Any type that fulfils this LSP condition. Or perhaps no methods that "clash" with those from the other half of the intersection.

@DiscordLiz

Which makes it intersections all the way down and no different than being Any, cause we have to assume LSP wasn't violated.

Perhaps we should restrict our examples to those that fulfil this condition? Even in the case we're untyped, it seems we have to assume that this method clash hasn't occurred.

@NeilGirdhar @mikeshardmind Specifics aside, I think whatever solution we come up with, it shouldn't require cast or type:ignore to pass through the type checker successfully. I mentioned the below relating to an earlier debate, but I think the same philosophy applies here - given the choice between:

Being given a type that's too broad for the attributes available - there won't be any errors raised when we do some things that are invalid.

Being given a type that's too narrow for the attributes available - there will be errors raised when we do things that are valid

The second one just seems much worse to me than the first, but I guess that is personal preference somewhat.

I guess this is maybe another way of phrasing @NeilGirdhar 's previous point about false positives vs false negatives.

mikeshardmind commented 11 months ago

The False positives case is preferable because we still give the user a way to handle it without a type ignore and without additional runtime costs. The False negative case cannot be remedied except by removing Any, and the whole reason we can't ban Any from intersections is that type checkers have stated this could happen in a way users don't have a way to remedy or even a clear way to inform them.

Consider this:

from untyped_library import SomeParserAugmentor  # user may not realize this is untyped at first
from typed_library import SomeParser  # generic over T, provides .parse(self, arg: str) -> T

def my_func(input_: str, parser: SomeParser[int] & SomeParserAugmenter ) -> None:
    """ Typed permissively so that other people can use the function """

   # This errors, we restrict the interface of Any when it clashes with what is typed
   # This also means the person writing the intersection is the one who gets the error
    x = parser.parse(input_, fix_mojibake=True) 

class _Parser(SomeParser, SomeParserAugmentor):
    """ Internal fallback where we create only the baseline functionality """
    pass
...

# this is fine, subclassing from Any means the attributes needed are assumed there. 
# This is the case currently with subclassing Any, not even just with this proposal
my_func(some_user_content, _Parser(int))

In this case, the user actually has the tools to fix this without a type ignore in multiple ways.

Because they are informed that something isn't meeting their expectations, they may contribute upstream types
They can also add a protocol stating that the attribute exists:

from untyped_library import SomeParserAugmentor  # user may not realize this is untyped at first
from typed_library import SomeParser  # generic over T, provides .parse(self, arg: str) -> T
from typing import Protocol

class FixesMojiBake[T](Protocol[T]):
    def parse(self, input_: str, fix_mojibake: bool=False) -> T: 
        ...

def my_func(input_: str, parser: SomeParser[int] & SomeParserAugmenter & FixesMojiBake) -> None:
    x = parser.parse(input_, fix_mojibake=True)  # This no longer errors

class _Parser(SomeParser, SomeParserAugmentor):
    """ Internal fallback where we create only the baseline functionality """
    pass

# this is still fine
my_func(some_user_content, _Parser(int))

With this, they've now explicitly stated they expect this compatible method to exist. This gives users the information they need to type things without compromising on safety for the portions of code that are already typed, and while being easy to accommodate situations outside of their control with untyped libraries. Note that this is safer thana type ignore, it would rely on the nature of Any for the thing Assigned to it, but it requires explicitly escaping out for things that could otherwise be accidentally hidden in terms of misuse by restricting the interface on usage to what is explicitly stated. This also automatically continues working should the untyped library become typed in the future without something lurking around for compatibility that could be silencing errors.

I think if we pair this approach with a note that type checkers should inform users when Any is present in an intersection that is causing an error and point them to a reference of how to handle this, we can make this easy to learn and ergonomic as well.

gvanrossum commented 11 months ago

A few observations after having read Eric's comment and trying to reason various things out in my head.

It is very easy to confuse yourself by mixing up the meaning of & with that of | (or vice versa). This happened to myself a few times. I think it's due to the fact that after x: T1 | T2, we consider valid properties of x those that are valid for T1 and for T2, while after x: T1 & T2, intuitively, the set of valid properties is the union of the valid properties of T1 and T2. (It's a similar confusion as often happens around contravariance.)
If is easy to reason things out for proper types, including edge cases like object and Never.(top and bottom, respectively), by thinking of types as sets of objects. The intersection and union operators work sensibly in this case, and all the rules you can derive this way (e.g. distributivity, subtyping, and attribute type evaluation, as shown in Eric's comment). However,Anyis *not* just another edge case. It is the defining building block of gradual typing, and behaves like top or bottom depending on which side of the subtyping operator (<:`) it appears.
Using assert isinstance(x, T) in examples is not all that helpful, because at runtime T is constrained to concrete types. We need to go back to presenting examples in terms of whether x = y is allowable or not according to the type system given specific types for x and y (and possibly what the type of x is afterwards, in situations where narrowing happens).
Generics present a whole nother can of worms. I haven't spent enough time thinking about these.
There are probably also traps due to the difference between structural subtyping (Protocol) and nominal subtyping.
And perhaps others due to the pragmatic treatment of TypedDict.
Folks, please proofread your messages.

I'l write more later.

gvanrossum commented 11 months ago

[me]

Given that a: Any & T means thata has all the attributes of T and they have the types specified by T, in this example, x has __add__(self, arg: int) -> int so the type of x should be revealed as int. That matches my intuition.

@NeilGirdhar Just something for you to consider. Let's just look at an example that doesn't involve Any:

Great, it should be possible to reason this out by considering types as sets of objects.

class C:
  def f(self) -> None: pass
class D:
  def g(self) -> None: pass
class E(C, D): pass

def f(x: C):
  if isinstance(x, D):
    reveal_type(x)  # C & D
    x.f()  # should be okay since x inherits from C
    x.g()  # should be okay since x inherits from D

f(E())

This is why the interface of C&D is the union of the interfaces of C and D.

But why introduce a new term, "interface"? Python's type system never uses this word -- we talk about types. I feel it's misleading to introduce this term -- probably because of my point (2). We should only consider questions of the form "if x has type T1 and y has type T2, is x = y acceptable?" And in addition, questions of the form "given x has type T and OP(a) is some operation (such as getting an attribute of a specific name, or calling it with specific arguments), is OP(x) valid?"

Now, consider:

* what is the interface of `Any`?

I feel this is not helping clear reasoning, given that in x = y, the rules when x has type Any are totally different from the rules when y has type Any. (And different again for OP().)

* what is the interface of `Any & C`?

Should be rephrase in terms of what happens to x = y or perhaps OP(x), separately for the case where x: Any & C and y: Any & C.

* So if `x` has type `Any & C`, what is the type of `x.f()`, `x.g()`, and `x.h()`?

That's a valid question. I think it should be answered by considering separately what the type of an expression is when x: Any and when x: C, and then doing an & over the resulting types.

Interestingly, since the answer might well be "an error occurs" for one of the branches of the & operator, maybe we should consider what the result should be of Error & T. Thinking about some examples, it seems that the Error type should disappear and the answer should be T -- IOW, it is not an error unless all branches of the & operator have type Error. This is nicely the dual of the error behavior for |, where the rule is that it's an error if it is an error for any branch of the | operator.

But now we get to think about when an error should be reported to the user -- surely not as soon as it occurs, in the case of &, or else x.attr where x: C & D would report an error if C supports attr but D doesn't -- and the intersection is meant to not report an error in that case.

Also, it seems that whatever rules we come up with for Error should also apply to Any? Except they don't -- It would seem that Error | T reduces to Error, but Any | T is irreducible. (I now understand that word. :-) It seems that we perhaps might regret that choice, but even those who regret it seem to agree that it's too late to change -- and I'm with Eric in that I think we should make sure that & is the dual of | in this respect, like it is everywhere else.

It's really worth reading Eric's elucidating comment.

For reference, there are two proposals being considered here. I'll call them 4 and 5 (relating to names from #1). Proposal 4 says: x.f(), x.g(), x.h() are all Any. Proposal 5 says: x.g(), x.h() are both Any, but x.f() is None.

I don't know what the type of x is here, but it can't be from the code snippet you gave at the top of your comment (which has neither Any nor h()). Could you provide the whole context?

NeilGirdhar commented 11 months ago

Specifics aside, I think whatever solution we come up with, it shouldn't require cast or type:ignore to pass through the type checker successfully....I guess this is maybe another way of phrasing @NeilGirdhar 's previous point about false positives vs false negatives.

Exactly what I was getting at :)

I think if we pair this approach with a note that type checkers should inform users when Any is present in an intersection that is causing an error and point them to a reference of how to handle this, we can make this easy to learn and ergonomic as well.

This is a great idea that actually works with both proposals. We could leave the "correct" semantics of proposal 4 coupled, and encourage type checkers to implement an optional warning when someone accesses an attribute on something with type Any & T. This warning could then be resolved by:

using type-ignore,
adding a cast, or
disabling the warning.

3. Using assert isinstance(x, T) in examples is not all that helpful, because at runtime T is constrained to concrete types. 4. Generics present a whole nother can of worms.

I agree that we should explore other ways in which Any & T pops up naturally. However, I think the is-instance example works when the type of x before the instance check is a type variable bound to Any.

3. We need to go back to presenting examples in terms of whether x = y is allowable or not according to the type system...

But why introduce a new term, "interface"? Python's type system never uses this word -- we talk about types. I feel it's misleading to introduce this term -- probably because of my point (2). We should only consider questions of the form "if x has type T1 and y has type T2, is x = y acceptable?"

Right, I see where you're coming from. If that's the language you like, then I think you'd find Jelle's comment the most clarifying. I was using language closer to Eric's comment.

As per Jelle's comment, two things constitute the "type" (as returned by reveal_type) of x:

For which y is x = y is allowable? And,
What operations can you do with x?

These correspond to what Eric called subtyping and attribute type evaluation. And the latter is what you're calling OP(x).

That's a valid question. I think it should be answered by considering separately what the type of an expression is when x: Any and when x: C, and then doing an & over the resulting types.

Yes, you're right of course. I made a mistake in my evaluation.

Could you provide the whole context?

I think I just made an error when evaluating the values 😄

Regarding the ideas surrounding Error, I need to think about it more.

mikeshardmind commented 11 months ago

This is a great idea that actually works with both proposals. We could leave the "correct" semantics of proposal 4 coupled, and encourage type checkers to implement an optional warning when someone accesses an attribute on something with type Any & T. This warning could then be resolved by:

Please review the important distinction with false negatives. There is no way for a user to know they need to resolve a false negative arising from types outside of their control because the user will not get information. This would be particularly insidious if the user thought they had something typed that wasn't. Additionally, the entire reason we are having to consider this is that we cannot ban Any from intersections entirely, they can arise deeper in the type checker, per Eric's comment taking that off the table. This means in those cases, there's nowhere to give the user the error for Any existing in the intersection, and there would only be a resulting false negative. This cannot be used to resolve this in the case of proposal 4 if we continue to accept previously accepted arguments.

mikeshardmind commented 11 months ago

Also, it seems that whatever rules we come up with for Error should also apply to Any? Except they don't -- It would seem that Error | T reduces to Error, but Any | T is irreducible. (I now understand that word. :-) It seems that we perhaps might regret that choice, but even those who regret it seem to agree that it's too late to change -- and I'm with Eric in that I think we should make sure that & is the dual of | in this respect, like it is everywhere else.

We need to be particularly careful that anything we say is a dual and should have symmetry actually is or we may create worse situations by mis-considering something. It would be one thing to intentionally decide something isn't useful as symmetrical, it would be another to inadvertently state things that arent symetrical must be treated as if they are. This issue here elaborates on Eric's comment and shows that only some of what was said was true by nature of universal properties.

Importantly: from it:

The duality between intersection and union lies in reversing the subset relation, not in replacing an "and" with an "or".

gvanrossum commented 11 months ago

Duly noted that duality is tricky.

I have my thoughts about false positives vs. false negatives but they are colored by my experience spinning up mypy as it was being written in a production code base with millions of lines of existing (untyped) code. False positives would cost the tiny mypy team days of explaining to random senior engineers why they had to make changes to working code to satisfy the upstart type checker, while false negatives were what everybody got all the time (since most code was still untyped).

I am despairing at ever coming to terms with this topic. I wrote the original text in PEP 483 and I admit it was vague to the point of being unhelpful. (Since it is not a standards track PEP, I'd be happy to entertain PRs that fix specific issues with it, though not with proposed full rewrites.)

I am still hopeful that we can define intersections in a way that's the dual of unions (using the correct definition of dual).

NeilGirdhar commented 11 months ago

There is no way for a user to know they need to resolve a false negative arising from types outside of their control because the user will not get information.

Why can't the type checker produce a warning when a user accesses an attribute of something that has type Any & T?

This means in those cases, there's nowhere to give the user the error for Any existing in the intersection,

In the cases that the attribute access happens deep in some library, the user hasn't been "robbed of a potential error". If there is an error, it's the library author that needs to see it.

mikeshardmind commented 11 months ago

There is no way for a user to know they need to resolve a false negative arising from types outside of their control because the user will not get information.

Why can't the type checker produce a warning when a user accesses an attribute of something that has type Any & T?

That was Eric's statement that took banning Any in intersections off the table, and was taken at face value given the work he has put into pyright. I don't know why it can't, I see no reason why it shouldn't be possible to bubble it up to where it was sourced from, but I'm not actively maintaining a type checker right now either. Considering this was used to take an option off the table and nobody contested that, we can't simultaneously accept it to take one option off the table and reject it to say something else isn't an issue. If you'd like to contest that, I'd happily bring back the argument to ban Any in an intersection because I can't see a single case where intentionally using it in one is beneficial. It would be better to omit the intersection entirely and just type Any instead of Something & Any with some of the proposed behaviors. (If the presence of Any in the intersection effectively allows everything as compatible, and silences all errors, it's no different than Any even if we have theory that says we shouldn't remove it, the practical use is no different.)

In the cases that the attribute access happens deep in some library, the user hasn't been "robbed of a potential error". If there is an error, it's the library author that needs to see it.

This isn't necessarily true though. It's not an error in the library, it's a lack of information that would be silently hiding any issues in an intersection resulting between my code (typed) and library code (untyped). This would become yet another social pressure to add typing, to libraries that may have various reasons not to, such as still using things the type system does not yet have a way to express.

From a pragmatic standpoint, if someone says "I need to be able to use something as this type and this other type", if one of those types isn't typed, do we really want to discard information on the type that is typed?

mikeshardmind commented 11 months ago

I have my thoughts about false positives vs. false negatives but they are colored by my experience spinning up mypy as it was being written in a production code base with millions of lines of existing (untyped) code. False positives would cost the tiny mypy team days of explaining to random senior engineers why they had to make changes to working code to satisfy the upstart type checker, while false negatives were what everybody got all the time (since most code was still untyped).

I can appreciate that perspective. Largely, I would agree on minimizing false positives. When I was coming up with the middle ground, the considerations I took with it were the following:

The false positives possible here arise for the users of an intersection, not for the author of other untyped code.
The false positives are resolvable in a way that remains safe if the other code becomes typed in the future (no type ignore supressing information)
The false positive's resolution can be used as a stepping stone to a stub or type information being improved, as it promotes the use of duck typing of things which are not yet fully typed.
That I do not see a case where the author of an intersection would intentionally include Any in it, rather than just use Any until the operands of their intersection were typed.
That (and this may end up revisited) disallowing Any in intersections entirely was considered non-viable.
That this would be the first way where a user expecting something to be typed because they typed it could interact with other types and not be checked anymore. The boundary between typed and untyped vanishes.
The alternative of false negatives would push those with typed code that is "Broken" by this to push for the full removal of any code that isn't typed. I don't think this is largely positive socially, but I'll leave that at that for now, it doesn't need a full rehashing.

NeilGirdhar commented 11 months ago

If you'd like to contest that,

I don't want to contest that because I agree with Eric that Any & T can occur deep in library code, e.g., as a result of unbound type variables.

I'm saying that in your own code when the you access an attribute of something having type Any & T, then the type checker could warn you that this access is not being type-checked. Wouldn't that resolve your worry about type checking letting errors slide for users?

If you feel strongly about this, maybe you should reopen #1 and try to swing the consensus to the proposals you were supporting (1 and 2). This thread was supposed to be about 4 versus 5.

it's a lack of information that would be silently hiding any issues in an intersection resulting between my code (typed) and library code (untyped).

Can you give an example of library code that accesses an attribute of something having type T & Any and it hiding an error in user code?

From a pragmatic standpoint, if someone says "I need to be able to use something as this type and this other type", if one of those types isn't typed, do we really want to discard information on the type that is typed?

Yes, OP(x) is necessarily broader. This is analogous to how a parameter that is annotated to have type T & U becomes Any when T or U happens to be Any.

The alternative of false negatives would push those with typed code that is "Broken" by this to push for the full removal of any code that isn't typed

I don't see how that helps since if something is annotated T & U, and T is Any, then no one is going to push for the annotation T & U to be removed. They may push for (or wait for) T to be given a narrower definition.

mikeshardmind commented 11 months ago

I'm saying that in your own code when the you access an attribute of something having type Any & T, then the type checker could warn you that this access is not being type-checked. Wouldn't that resolve your worry about type checking letting errors slide for users?

No. Categorically, no, because at that point I'd rather make it an error to include Any in the intersection than an optional warning. Many of the cases we have motivating this are decorators with intersections as the return type. The behavior of (Op(Any & T) being equivalent to Op(Any) means the intersection there is pointless for those specifying their needs, and may lead to surprising outcomes for users. My view on this is not only from the point of safety and correctness, but also on making the type system less surprising to users. If Any & T is no more specific about what is allowed than Any, then specifying Any & T shows that the user clearly thinks something more is happening or they wouldn't bother with a more complex type that is the same thing.

This also relates to:

From a pragmatic standpoint, if someone says "I need to be able to use something as this type and this other type", if one of those types isn't typed, do we really want to discard information on the type that is typed?

Yes, OP(x) is necessarily broader. This is analogous to how a parameter that is annotated to have type T & U becomes Any when T or U happens to be Any.

You say it's an obvious analog, but I challenge you to find a case where you would intentionally include Any in an intersection as a parameter and it be more useful than simply stating Any. If there is no case where it is, the behavior is not useful, and the statement of that intersection would indicate that a user has the expectation that something else would happen here.

If you feel strongly about this, maybe you should reopen https://github.com/CarliJoy/intersection_examples/issues/1 and try to swing the consensus to the proposals you were supporting (1 and 2). This thread was supposed to be about 4 versus 5.

Since we clearly aren't all on the same page about the reasons why other proposals were not an option, I think that is necessary. I only agreed that we could stop that discussion because I thought people were going to be consistent in applying the reasoning for other options being taken off the table.

Can you give an example of library code that accesses an attribute of something having type T & Any and it hiding an error in user code?

I'm not rehashing examples that were already given and interacted with. That you have ignored prior examples I have given on this matter does not make me inclined to take the time on this, those interested can read the prior discussion.

The alternative of false negatives would push those with typed code that is "Broken" by this to push for the full removal of any code that isn't typed

I don't see how that helps since if something is annotated T & U, and T is Any, then no one is going to push for the annotation T & U to be removed. They may push for (or wait for) T to be given a narrower definition.

Removal of untyped code does not need to be by removal of the code. I thought you had been aware of the other related discussions on this, and the part of that statement you left out was the social effect. Socially pressuring other libraries to add types is not a positive effect and is more divisive than a lot of people deep in typed python realize. I value a pragmatic boundary between what is typed and what is not that makes it clear when typed code is interacting with untyped code without needing to compromise on the portions that are typed, as it reduces that pressure to a minimum.

DiscordLiz commented 11 months ago

I don't want to contest that because I agree with Eric that Any & T can occur deep in library code, e.g., as a result of unbound type variables.

Since neither of you have said it, We can also forbid unbound type vars in intersections or say that the behavior of an unbound typevar in an in intersection is to default to a bound of object rather than of Any, which remains as permissive to what can be passed in, but doesn't create that issue. This isn't a strong argument against forbidding Any in an intersection, but It also doesn't fix everything. The problem here isn't just typevars.

def real_class_decorator_example[T: object](user_type: type[T]) -> type[T & MyProtocol]:
    ...

If a user calls a function like this today, inside the body of foo, it is only safe to access what is on type[object]. However, what they get back is safe to use based on whatever the input to this function was. It is clear that this decorator adds support for a protocol, but if T going in is Any, then that protocol is never going to be checked properly under option 4.

I'm concerned that if we just allow Any here to remain in and have the behavior from option 4, that a lot of situations involving intersections become useless in the presence of any untyped code at all, because it's just too risky that somewhere deep in the type checker, something became as useless as Any unexpectedly.

what @mikeshardmind came up with for option 5 avoids this. If we're reopening up the other options, I think T & Any being treated as T for safe access also avoids this, even if we know it isn't theoretically fully accurate. Both of those options retain what he expressed about a pragmatic boundary between typed and untyped as well, and I agree with him that that it's necessary to have that boundary. He spoke from social pressure, but there are other issues as well. Not everything in python can be expressed by the type system. That's okay with me. But I'd like to be able to use those things when needed and still keep what assurances can be provided by typing in the process.

DiscordLiz commented 11 months ago

For what it's worth, keeping in mind that the difference between intersections and unions is more about the subtyping relationship, and that Any behaves differently on opposite sides of <:

T | U Allows anything that is a subtype of T or a subtype of U to be assigned to it. In the case of U being Any, this allows anything. T | U Allows only what is common between T and U to be accessed without discriminating first. In the case of U being any, this allows exactly what is available on T.

I think we all agree on that part right?

T & U allows only that which is a subtype of both T and of U to be assigned to it. If U is Any, the requirement of it being a subtype of T has not gone away.

A concrete object known as Any, but then known as a subtype of T is no longer Any It's a subtype of T. The fact that in python we do not distinguish between an unknown that should be treated as anything until it is known and literally allowing anything is making this dual harder for people to reason about.

From all of that, a consistent behavior here is that Any & T no longer has the compatibility behavior of Any as it's no longer Any.

DiscordLiz commented 11 months ago

And if you extend that to "It also needs to be a subtype of Any though", well what does that look like?

class Intersected(T, Any): ...

I know he called it a compormise, but it's sure looking less like a compromise and more just like the correct consistent behavior with other typing decisions to me

mikeshardmind commented 11 months ago

sigh it's consistent with other typing decisions, and consistent with the subtyping relations involved, but I don't know that the prior decision involved was well considered.

I happen to like that outcome, but the ability to subclass Any at runtime to indicate that behavior was accepted with a brief bit on the mailing list that received minimal attention and was done without a pep discussing the impact on how we treat Any in the type system, and I won't hinge it on "We have this behavior, so this must be the right interpretation". It being a consistent interpretation that continues following the subtyping rules is nice, but that was by design. I won't go as far as to say it's the only possible interpretation given the only minimal prior consideration.

I think it has precluded something important that could have been done that you actually got at, splitting "Unknown, could be anything, but use what you know about it and permissively allow use until you know something more specific about it" and the theoretical "Any"

mikeshardmind commented 11 months ago

I did some more thinking about this, I think option 4 actually is a non-starter unless we also make changing the behavior of runtime subclassing of Any part of the PEP. I'm not in favor of this as I think option 5 is better for users, but I do think it needs to be said so that if option 4 is decided on, that isn't lost.

Not doing so would make Any special in that it is the only case where an instance of a valid subclass:

class UnimportantName(T, U):
    pass

Did not have the same safe operation on it as

T & U

valid emphasized, because some intersections would require additional reconciliation in the body. The first form is valid today with Any as a base, and matches the behavior of option 5

gvanrossum commented 11 months ago

Did not have the same safe operation on it as

Maybe it's clear to you, but it's not clear to me what safe operation you are referring to here?

mikeshardmind commented 11 months ago

Maybe it's clear to you, but it's not clear to me what safe operation you are referring to here?

When given that type, what do type checkers allow to be done to/with it, ie. the usage side of the equation, not the assignment side.

There are already rules for this with Any in a base, and things in the typeshed relying on them.

mikeshardmind commented 11 months ago

@gvanrossum As an example, here's pyright's handling of that, as well as mypy's

Code sample in pyright playground

from typing import Any, reveal_type

class X:
    def foo(self) -> int:
        return 1

class Z(X, Any):
    pass

z = Z()

reveal_type(z.foo())
z.foo("broken")
reveal_type(z.bar())

Mypy gives the following:

main.py:13: note: Revealed type is "builtins.int"
main.py:14: error: Too many arguments for "foo" of "X"  [call-arg]
main.py:15: note: Revealed type is "Any"
Found 1 error in 1 file (checked 1 source file)

Pyright gives:

Type of "z.bar()" is "Any"
Type of "z.foo()" is "int"
Expected 0 positional arguments  (reportGeneralTypeIssues)

but the ordering is by severity, the playground link above may show this in better detail.

In any case, going with option 4 will introduce an inconsistency here unless this is also changed. I don't know if changing this would be pragmatic.

Option 5 is consistent with this as is, but may lead to users asking why that's the behavior.

gvanrossum commented 11 months ago

Thanks. So IIUC "safe operation" is a piece of jargon standing for the set of safe operations?

Since multiple inheritance is ordered, but intersection is not,

class C(T, U): ...

and

type C = T & U

are not completely equivalent, right?

Also, I saw somewhere (in the discussion about #1) an argument going like this (though maybe it was using protocols?):

class A:
    def foo(self) -> None: ...
class B:
    def foo(self, x: int, /) -> None: ...
type C = A & B

claiming that C is empty since the interfaces for foo conflict. But couldn't this be satisfied by

class D:  # Maybe inherit from C?
    def foo(self, x: int = 0, /) -> None: ...

???

mikeshardmind commented 11 months ago

Since multiple inheritance is ordered, but intersection is not, [example..] are not completely equivalent, right?

Correct, though in the case where no LSP violations have occurred and nothing needs to be added to the body to reconcile, structurally (in the same above terms for what operations are allowed on something of that type) they are equivalent with (maybe) an exception in the case of Any. The intersection would also not satisfy the subclass nominally.

claiming that C is empty since the interfaces for foo conflict. But couldn't this be satisfied by

Yes, we later came to the conclusion that could be satisfied by adding more to the body to reconcile the difference. This is where the problem of considering Any as infinitely compatible rather than pragmatically something we don't know comes in, and also conflicts with an existing accepted definition with subclassing.

randolf-scholz commented 11 months ago

Since multiple inheritance is ordered, but intersection is not,
class C(T, U): ...
and
type C = T & U
are not completely equivalent, right?

This is really something we need to address in the specification, imo. Until now, type-checkers consider a class C(A,B) as a valid substitute for both A and B, which leads to false negatives: https://github.com/python/mypy/issues/15790

Just as a random idea: would it be feasible to extend subclassing to allow:

class C(A & B):
     ...

Which creates a TypeError at class definition time if the diamond problem is present? This would provide a safer way for users to do LSP-compatible multiple inheritance.

EDIT: opened https://github.com/CarliJoy/intersection_examples/issues/33, moved my comments there.

mark-todd commented 11 months ago

Just as a random idea: would it be feasible to extend subclassing to allow:

class C(A & B): ...

As far as I know, at runtime A & B always calls the magic method (https://docs.python.org/3/reference/datamodel.html#object.__and__). So this syntax is already used - however, I suppose there's no reason why there couldn't be a builtin intersect function that does the same thing?

I would argue though if we enter runtime territory we're expanding this into probably a different PEP

mark-todd commented 11 months ago

@mark-todd It should be using type.__and__. We already have today that A | B returns a UnionType object. Likewise, A & B should return an IntersectionType whose metaclass could take care of different subclassing semantics. The problem is rather that A | B is not an instance of type, but I think for subclassing to work we would need that A&B returns an instance of type. But maybe there is another workaround...

Gotcha, although I do think inheriting from a type sounds super weird - as you say if it uses a different metaclass it wouldn't be compatible. Using isinstance of a type is one thing, but this feels like a much stronger use. Also, since A & B is commutative, what would it's MRO look like?

Sometimes it feels a bit weird that python does not provide a standard way to specify both type and instance members. Nowadays, we have __class_getitem__, but where is __class_or__, __class_and__, etc.?

Yeah agree on this - to be honest I feel like __class_getitem__ could have been done as @classmethod on __getitem__, but that's another discussion.

mark-todd commented 11 months ago

Ah I see - but I suppose I was talking about a more practical issue - sure, I see that if the diamond problem is resolved, the order doesn't matter, but for this to be backwards compatible there has to be a defined order:

>>> class A:
...     pass
... 
>>> A.mro()
[<class '__main__.A'>, <class 'object'>]
>>> class B:
...     pass
... 
>>> class C(A,B):
...     pass
... 
>>> C.mro()
[<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>]

The mro method must return a list of classes in their resolved inheritance order - if A & B are commutative this is no longer resolvable in this structure - it would have to be something like:

>>> C.mro()
[<class '__main__.C'>, set(<class '__main__.A'>, <class '__main__.B'>), <class 'object'>]

CarliJoy commented 11 months ago

It is not possible to create a class from an union on runtime, I don't see a reason why you should be able to create a class from an intersection.

Also this has nothing todo with Any & T handling. Please discuss it in another issue.

NeilGirdhar commented 11 months ago

As an example, here's pyright's handling of that, as well as mypy's

Yes, but in your same example, if you replacead Any with classes with broad interfaces, you could produce other results than you got with Any. Therefore, I don't think it's right to conclude that (Any & T).x is T.x when x exists on T.

mark-todd commented 11 months ago

Yes, but in your same example, if you replaced Any with classes with broad interfaces, you could produce other results than you got with Any

Point of clarification - when you say "classes with broad interfaces" do you mean one that includes foo?

mikeshardmind commented 11 months ago

Yes, but in your same example, if you replacead Any with classes with broad interfaces, you could produce other results than you got with Any. Therefore, I don't think it's right to conclude that (Any & T).x is T.x when x exists on T.

This is the existing behavior for type checkers when considering something that is a subtype of Any and a non-any class simultaneously. There are things already relying on this behavior of subtyping in the typeshed and in real world code.

Can you really say we should have differing subtyping behavior? Weren't you saying you were trying for consistency?

NeilGirdhar commented 11 months ago

Point of clarification - when you say "classes with broad interfaces" do you mean one that includes foo?

I mean classes could include anything, including foo with various overloads.

There are things already relying on this behavior of subtyping in the typeshed and in real world code.

It would nice to see how changing OP(X.f) to OP(T.f & Any) would cause type errors.

Can you really say we should have differing subtyping behavior?

Yes, I don't think it makes sense for X(T, Any).f to be narrower than X(T, U).f.

This is the existing behavior for type checkers

The reason for that is probably because intersections don't exist yet and so type checkers can't return T.f & Any even if they wanted to.

mark-todd commented 11 months ago

I mean classes could include anything, including foo with various overloads.

Thanks - as I understand it, we're working under the assumption that Any is not quite anything, but anything that does not violate LSP with the new class given (so foo method with clashing overload is banned). This isn't an unreasonable assumption because if you tried to mix these classes together it would raise an error.

It would nice to see how changing OP(X.f) to OP(T.f & Any) would cause type errors.

Got a bit lost in the the notation here - did you mean changing OP(T.f) to OP(T.f & Any)? If so it should preserve all properties of T, so no type errors on that side. However, & Any now means anything that doesn't clash with T is fine, so other accessed attributes would no longer return errors.

mikeshardmind commented 11 months ago

The reason for that is probably because intersections don't exist yet and so type checkers can't return T.f & Any even if they wanted to.

That's not the reason. Please feel free to look into the existing behavior, this was chosen behavior for this specifically to suit real world needs, some of which were dynamic attribute access for only missing attributes, such as mocks.

Edit: I'll include it here as well:

This makes it so you can pass a mock object to any function, regardless of its annotation, which is good because that's usually how you use mock objects. But it also means that the type checker knows what the signature of the assert_has_calls method is, and can tell you if you get it wrong.

https://mail.python.org/archives/list/typing-sig@python.org/thread/GULRKYI7XOB3FLAEFC6OYSTBS5FIA5PU/

https://github.com/python/cpython/issues/91154

NeilGirdhar commented 11 months ago

t it also means that the type checker knows what the signature of the assert_has_calls method is, and can tell you if you get it wrong.

I disagree that this is an example of basic intersection. This is more than intersection. This is the option 5 behavior you want, and I think it should be annotated with an operator that explicitly says: Use T.f and not U.f—even if U is Any or whatever it might be.

It would nice to see how changing OP(T.f) to OP(T.f & Any) would cause type errors.

If so it should preserve all properties of T, so no type errors on that side.

(Yes. T.f)

The point is that Mike was arguing that it would be unsafe to broaden any definition of intersection to the "correct" option 4 definition. But if you think that there are no errors caused by this broadening, then you agree with me.

DiscordLiz commented 11 months ago

@NeilGirdhar if you would actually read what he wrote instead of trying to talk louder and ignore the points he made while claiming to read them, you'd know that wasn't the point he was arguing.

I'm not saying that Mock inheriting from Any creates an intersection, I'm saying we have a behavior for when Any is one of multiple supertypes and that intersection having differing behavior would be inconsistent with that even with it not being an intersection, and just considering "what happens when we have multiple known supertypes" Note that the order in which Any is present in bases does not change the behavior.

DiscordLiz commented 11 months ago

That's quoting him from the other thread in something you ignored and continued talking past.

mikeshardmind commented 11 months ago

I'm not happy about how some of what I have said was ignored, but I'm here to sum up the multifaceted argument for 5 over 4. Any further questions about this I will get to when I can.

It is safer.

Note: Yes, this is an argument I made, but it is not the entirety of why I argued it.

It can produce a false positive in a narrow case involving the diamond pattern where one side of the diamond is also untyped, it cannot produce a false negative. The false positive can be resolved in a manner that does not suppress any type information and continues to function as intended if the unknown side of the intersection later becomes typed.

The false negative case present in option 4 misses a real error, assuming the diamond patterns must be allowed and are common enough that a false negative is better than a false positive in them.

Those familiar with the problems of diamond patterns would likely not call this a common case or that those with it should especially fix their type information to be sure they are not running into subtle problems as a result.

It is indisputable that a false positive which is resolvable without concealing other issues is safer than a false negative given a true dichotomy between the two. One raises attention to a potential error and provides a safe means to resolve it, whether a false positive or not, the other ignores something it has information to know may be an error.

It is less surprising to user expectations.

This one is a matter of opinion, however, if a user types T & U and not Any, then to me, it is clear from intent that the user does not intend to allow using T & U as permissively as any, and may be surprised by behavior which ends up equivalent to this when considering what is safe for operations.

It is consistent with other cases of subtyping behavior already in the type system

Any as a supertype is necessary not just for mocks, but also for unknown base classes. The behavior of this predates runtime use in 3.11, and has been a pattern long before it with conditional assignment of Any to object at runtime. The behavior in this case is well established and explored above and in other threads, the portion of the class which is known takes precedence. This is not mro dependent, nor does Any provide any actual methods or attributes considered here. Having an intersection behave differently would not only be inconsistent application of subtyping, but surprising to users familiar with the other pattern.

It's easier to teach

The reasons why Any & T are technically not just Any come from deep in math and type theory. They're relatively inscrutable even to the frequent typing user who is not a type theory expert. Not reducing it to Any, but treating it as Any does not mirror the behavior of Unions in a natural and easy to explain to users manner.

Not reducing it, and using the distinction is incredibly easy to teach, and mirrors the situation with subclassing including with multiple bases.

Users wanting the more permissive option have a free way to get it, with no runtime overhead

s: Any = SomeIntersectionTandAny()

The contrasting runtime cast needed to go in the other direction:

s: T & U = cast(T & U, TandAny)

Not only is not allowed without runtime cost, but if Any is always recursivelty intersected, there is no proper way to type this which is stricter without actually supressing information. The cast would hide the side of the intersection that was Any, even if it later became better typed.

Resolving the the case with 5 and the false positive only remains assignable so long as the protocol added to the intersection is compatible with the side that ceases to be Any in the future.

As for it needing to be at runtime, It cannot be done as assert isinstance because one of the cases for intersections includes Protocols, which may not be checkable at runtime, so it would be an error to write a failing assert and only support users with -O

Option 5 was specifically considered to introduce nothing that needed to be handled differently to existing cases in the type system, remain consistent with existing decisions, be safe, correct, easy to learn, and remain relatively obvious in why things do or do not work, allowing users to get feedback early enough in the process about issues to resolve them.

The only one of those goals which is a matter of subjectiveness is easy to learn. I do not know if I am the appropriate person to actually judge this one.

It does not consider "what if people make something invalid and then ignore it", if the type checker has information, it should use it and it's the user's responsibility to not misuse type ignores.

It does avoid creating any situation where ignoring a false positive is the only option, providing a safe method to be more specific, add more to the intersection. I believe that when users begin seeing the natural use of intersections, this will be relatively obvious. One of the primary motivations for this is easier and less repetitve duck typing through the composition of protocols.

mark-todd commented 11 months ago

The simulator I've been working on now supports Any in the format of Option 5. I'm hoping by testing examples it'll be much easier for everyone to understand how it behaves and agree a behaviour: https://github.com/CarliJoy/intersection_examples/pull/34

randolf-scholz commented 11 months ago

I don't think option ④ is hard to teach, it comes from a basic substitution principle:

Any can be considered as 'unspecified' type that is treated leniently (we always assume it has the right attributes/methods when questioned). Fundamentally, anywhere where Any appears, there should be some type S that can be substituted for it.

We can shine a light on the practical difference between ④ and ⑤ in the following example:

setup

```python # external_library.py class HasGet: def get(self, key): ... # stub: (self, key: U) -> V class Foo: ... # does not implement `get` ``` ```python # my_library.py class MyGetter: def get(self, key: ) -> Y: ... class Bar: ... # does not implement `get` ``` Apart from the overlap of `HasGet` and `MyGetter`, we assume the classes have pairwise disjoint interfaces, which we can express symbolically as `Foo` ⟂ `Bar`, `Foo` ⟂ `HasGet`, etc.

Then, consider the functions

from external_lib import HasGet, Foo, U

def both_have_get(x: HasGet & MyGetter) -> None: 
     x.get( U(...) )  # ⑤ false positive if external untyped
     x.get( X(...) )

def theirs_has_get(x: HasGet & Bar) -> None: 
     x.get( U(...) )
     x.get( X(...) )  # false negative if external untyped

def ours_has_get(x: Foo & MyGetter) -> None: 
     x.get( U(...) )  # ④ false negative if external untyped
     x.get( X(...) )

def neither_has_get(x: Foo & Bar) -> None: 
     x.get( U(...) )  # false negative if external untyped
     x.get( X(...) )  # false negative if external untyped

So, what is the type of x.get in each case?

function	④ + external untyped	⑤ + external untyped	external typed
both_have_get	Any & (X -> Y)	(X -> Y)	(U -> V) & (X -> Y)
theirs_has_get	Any	Any	(U -> V)
ours_has_get	Any & (X -> Y)	(X -> Y)	(X -> Y)
neither_has_get	Any	Any	ERROR

Option ⑤ introduces a bias: when we intersect with an untyped type, we assume the interfaces are disjoint. This is obviously incorrect when intersecting HasGet with MyGetter, in the case when external_lib is untyped, and produces a false positive inside both_have_get if x.get gets called with an argument of type U. However, it is able to avoid a false negative inside ours_has_get.

From a purist standpoint, I prefer ④ since it satisfies a substitutability property: to get the typed result from the untyped result, you simply substitute the actual type for Any, or remove it from the intersection if it is not present on the type. For ⑤ you actually have to add a constituent to the intersection when going from untyped code to typed code in the case of both_have_get.