TypedDict rules are unsound

JelleZijlstra commented 1 year ago

I am looking at the current rules in https://github.com/CarliJoy/intersection_examples/blob/main/specification.rst#typeddicts, which say that if two TypedDict members of an intersection share the same field with different types, then the intersection contains a field that is a union of those two types.

from typing import TypedDict
class A(TypedDict):
    a: int
class B(TypedDict):
    a: str
def takes_a(a: A) -> None:
    print(a["a"] + 1)
def takes_ab(ab: A & B) -> None:
    assert_type(ab["a"], int | str)
    takes_a(ab) # legal, A & B is compatible with A
b: B = {"a": "x"}
takes_ab(b) # legal, but throws an error at runtime

A similar hole:

def takes_ab2(ab: A & B) -> None:
    ab["a"] = 42 # legal, as ab["a"] is of type int | str
b2: B = {"a": "x"}
takes_ab2(b2)
print(b2["a"]) # now it's an int!

Possibly the type of ab["a"] should actually be int & str, not int | str.

DiscordLiz commented 1 year ago

Merging with what is essentially | via overloads in protocols has similar but different unsoundness as well. I thought about making another issue for just this, but it appears to be just logically the same issue. In both cases, the conflicting name is no longer usable as defined by an Intersection soundly as a result of trying to force it together with a Union.

erictraut commented 1 year ago

As I mentioned in other threads, I don't think the Intersection PEP should make any mention of protocols (structural types), nominal types, callables, overloads, etc. The same applies to TypedDict (which happens to be a structural type). If we define the rules for intersections correctly, there will be no need to mention any of these.

Consider that there was never a need to describe how a TypedDict works with a union; the behavior is obvious from the basic rules of how unions work. Likewise, there should be no need to describe how a TypeDict works with intersections.

I'm going to refrain from reviewing the draft PEP spec until it's a bit further along and incorporates some of the feedback from these threads. In its current form, it has some significant problems. The sections on "Handling Callables" and "Protocol Reduction" contain many statements that are demonstrably incorrect (as Jelle points out above by example). I think these need to be replaced by a more general set of simple rules like the ones I suggest in this comment.

CarliJoy commented 1 year ago

@erictraut I just started with quite a naive understanding of typing and no expert at all. I would be very happy if you could change the specification to a more sound definition. I can give you access to the repo once I am home (don't have my security key with me at the moment) or you simply create a pull request.

I do not agree that we don't need special mentioning of TypedDict and other Collections. For instance the current implementation of Intersection of basedmypy does not handle TypedDict correctly, it simply makes it a dict.

This kind of collections are special and it should be included in the Specification that they need to be handled in order to have a common understanding for them within all type checkers.

CarliJoy commented 1 year ago

Do I understand it correctly that properties of classes/types that mismatch should be intersections and not unions? So if we "merge" a protocol/typeddics we actually apply the same rules to each property?

erictraut commented 1 year ago

I do not agree that we don't need special mentioning of TypedDict and other Collections. ... This kind of collections are special

Why are they special? They're types within the type system. The type system also supports collection types besides TypedDict, and there will likely be new collection types added in the future. If we define intersections in terms of TypedDict or any other specific type, we would need to define how every newly-added type works with intersections. I'd consider that a failure. We need to define the rules of intersection in a way that work with all types in the type system — including all types that exist today and all types that might be defined in the future.

It's fine to use some specific examples (e.g. that involve TypedDict) within the PEP to demonstrate how these rules work when applied to actual types, but I think it's very problematic if the rules themselves include mention of TypedDict or any other type in the type system that obeys the rules of subtyping.

For instance the current implementation of Intersection of basedmypy does not handle TypedDict correctly

If we define intersections correctly, then it will be obvious that the basedmypy implementation is broken in this regard. There's no need to describe in the PEP how intersections work with every type as long as the rules can be applied to all types. Type checkers already require a lot of special casing when implementing certain operations on types like TypedDict, enums, properties, and tuples, but the rules for intersection should apply to all of these in the same way.

Do I understand it correctly that properties of classes/types that mismatch should be intersections and not unions?

By "properties", I assume you mean "attributes". (Sorry to nitpick, but the term "property" has a specific meaning in Python, and I don't think it applies here.) We need to define the rule for how attribute accesses work when applied to intersections. It's already defined for unions today; we simply need to extend that to intersections. And yes, if two types support the same attribute, then the result of accessing that attribute from an intersection of those two types will result in an intersection.

class A:
    a: Sequence[int]
    b: Sequence[int]

class B:
    a: datetime

def func(x: A | B, y: A & B):
    reveal_type(x.a) # Sequence[int] | datetime
    reveal_type(y.a) # Sequence[int] & datetime

    reveal_type(x.b) # Error
    reveal_type(y.b) # Sequence[int]

    reveal_type(x.c) # Error
    reveal_type(y.c) # Error

if we "merge" a protocol/typeddics we actually apply the same rules to each property?

That depends on what you mean by "merge". I'm not sure that a "merge" operation needs to be defined. For example, there's no place in pyright today where I attempt to "merge" two types.

CarliJoy commented 1 year ago

That depends on what you mean by "merge". I'm not sure that a "merge" operation needs to be defined. For example, there's no place in pyright today where I attempt to "merge" two types.

Okay than it is probably my lack in understanding of typings/typer checkers. How to do implement/define the access of attributes when subtyping (is this the correct term)? Given the examples you gave it is probably really possible to minimize the specification a lot.

I just don't have time for that within the next two weeks.

mikeshardmind commented 1 year ago

How to do implement/define the access of attributes when subtyping (is this the correct term)?

If we go with something entirely generalizable, the types of attributes of an intersection generalize as

Givens: (&) (...) indicating an intersection over the result (...) ∈ as a filter based on membership, where membership is defined by attribute attribute access on a set being treated as mapping to a new set

(A & B).foo = (&) {foo ∈ A & B}.foo

Put another way, "Taking the types in the intersection which have a member foo and then taking the intersection of each member foo found this way"

As an exhaustive pattern for example (A & B).foo if A provides foo and B doesn't, A.foo If B provides foo and A doesn't, B.foo if both A and B provide foo, A.foo & B.foo if neither A nor B provide foo, Never (Empty set)

Note that this is one of the arguments against reducing Any & T and for Disallowing Any in an intersection

Any & T (treated irreducibly) T doesn't provide foo: Any.foo (Any) T provides foo: Any.foo & T.foo, which becomes Any & T.foo.

I don't think T can ever be reduced from Any & T as a result of this, and I believe that if Any is allowed in an intersection this shows it can't be reduced safely either.

DiscordLiz commented 1 year ago

(A & B).foo = (&) {foo ∈ A & B}.foo

Do you have a formal proof of this I can take a look at? Intuitively, I think it should be correct and if it is, it puts a neat bow around Any as no longer needing to be treated as special, but I'm not sure what this would mean in a few cases.

class A:

    def foo(self) -> None:
        ...

class B:

    @staticmethod
    def foo() -> None:
        ...

class C:

    @classmethod
    def foo(cls) -> None:
        ...

x: A & B & C = ...  # lets just ignore that I don't know how to make such a class in a sensical way
x.foo()  # this is safe, and Known to be `None` ?

x: Callable[..., None] & Callable[[None], None]

x()  # None ?

These are the easy cases, now lets see:

Given the rule you provided

x: tuple[int, ...] & tuple[str, ...]
x[0]  # int & str? So this is `Never` then because you can't subclass both?

mikeshardmind commented 1 year ago

No formal proof yet, I didn't have as much time to work on this today, but it seemed like a more general rule that applied even to the "Special" case of Any (and later to other special cases of gradual typing that were shown)

All of what you have shown there is how I would interpret the rule applying, including the last one being an impossibility, yes. However, I would say that in the last case, x could still be an empty tuple, as it was possible to specify the tuple, but then impossible for elements of it to satisfy both conditions (the usefulness of such a specification of a tuple is unclear at this time)

mikeshardmind commented 1 year ago

Actually, I have a useful case for that specification already. This could be used for the not-uncommon request to mark args or kwargs as not taken in paramspec, by making it impossible to provide them in a way which satisfies the type of them.

DiscordLiz commented 1 year ago

I've thought about this for a while, but I don't know if I agree with either the conflicting tuples or the conflicting classes. Can you explain why this definition should work with or without a formal proof of it?

mikeshardmind commented 1 year ago

Sure.

First of all, if this is the only definition and only constraint of those classes A, B, and C, I think a Union is actually more appropriate.

What this essentially is doing is saying that if something meets the definition of A and if A has a definition of foo, whatever just met the definition of A has a definition of foo that can be used as a replacement for A's definition of foo. (ie. compatible subclasses of A are possible, along with A)

This should be completely uncontroversial I think, as it is how singular type specifications already work.

If we have already established that something meets the definitions for A, of B, and of C, then we only need to check if interactions with it are consistent with how we would interact with A, B, and C.

For the record, right now the A & B & C case with A.foo as a method, B.foo as a staticmethod, and C.foo as a classmethod, each taking no arguments other than what is bound by descriptors, I don't think it is possible to create a class consistent with all 3, so we would fail at the assignment of an instance to the definition, not at access to foo, but lets ignore that because this shows how we do not necessarily need to check for all kinds of consistency at every use, and this is helpful when considering special cases such as Any, allowing us to check consistency at specific points in time in the ways they matter at that time.

So in that case .foo() is consistent with a static method with no arguments, a class method accessed from a class taking only the bound class, and an instance method taking only the bound instance.

We have the above definition that does not care if that is possible or not and which can still determine if the use of an attribute is consistent with the definitions. This leaves anything based on concrete types to be checked only at assignment while checking attributes by compatibility with each use, rather than some symbolic idea compositing the entire type.

This also means that type checkers do not need to create and store some virtual subclass to check attributes. They can lazily construct an internal representation to check only the attributes which are used after initial assignability concerns are handled. (whether lazy creation of such an internal representation is desirable or not is up to each type checker, but this definition allows that possibility)

so in the case of tuple[int, ...] & tuple[str, ...] it is essentially tuple[int & str, ...] because that would be consistent with __getitem__ for each.

Because

ti: tuple[int, ...] = tuple()
ts: tuple[str, ...] = tuple()

are both valid, we have a tuple that can handle this in terms of assignability, and that cannot ever have a length greater than 0, because it would be impossible for any tuple to be constructed where __getitem__ could return an item satisfying both.

DiscordLiz commented 1 year ago

allowing us to check consistency at specific points in time in the ways they matter at that time.

Oh, okay, so this explains why I thought it shouldn't work with that rule and how it could, especially when you clarified that you didn't think it was possible to even get to that point without error in the case of the classes.

It's an interesting approach that does appear to work.

NeilGirdhar commented 1 year ago

I've removed the section on type-dicts as per the discussion above.

CarliJoy / intersection_examples

TypedDict rules are unsound #6