`TypedDict` with OrderedIntersection

CarliJoy / intersection_examples

Python Typing Intersection examples

MIT License

33 stars 2 forks source link

`TypedDict` with OrderedIntersection #42

Open mark-todd opened 9 months ago

mark-todd commented 9 months ago

Following on from from a discussion with @mikeshardmind in https://github.com/CarliJoy/intersection_examples/issues/41 I thought it might be worth separating this into a new thread.

This concerns how TypedDict might be combined with other classes in the new intersection - I'll start with the non-controversial case:

class A(TypedDict):
    foo: int

class B(TypedDict):
    bar: int

x: OrderedIntersection[A,B] = {"foo":1, "bar": 1}

So far so good. Now let's consider what happens if we mix it with another class or Protocol

class A(TypedDict):
    foo: int

class B:
   def test(self) -> int:
      return 1

x: OrderedIntersection[A,B] = ??

Now here we reach an issue, there's no way to make this class. A TypedDict traditionally cannot be combined with another class - and with good reason! We cannot introduce other methods to a TypedDict. But this led me to another case:

from typing import Protocol, TypedDict, TypeVar, OrderedIntersection

T = TypeVar("T")

class X(Protocol):
    def __str__(self) -> str:
        return "test"

def foo(t: T) -> OrderedIntersection[T, X]:
    ...

class Test(TypedDict):
    x: int

x: Test = {'x': 1}

y: OrderedIntersection[Test, X] = foo(x)

Here the use of __str__ in the X protocol does not impinge on the definition of Test, as this matches the function signature of a method already found on TypedDict (dict). So this intersection should be allowed.

My conclusion from this is TypedDict can only be combined with other TypedDict, or Protocol's that do not expand on the existing methods of dict.

TeamSpen210 commented 9 months ago

Is that actually useful though? A TypedDict is always exactly a dict, never a subclass. So to be valid, any such protocol would need to already be something that’s structurally satisfied by the typed dict. That’d mean the code would work already if just typed as the dict. I suppose it’s useful as an assertion that the dict satisfies the protocol, but you could do that in other ways already.

Maybe there’s a situation I’m not thinking of where it’s not clear to the checker that the protocol is satisfied? Or if you want to override what the dict’s behaviour is for the checker, but that’s rather type unsafe.

mark-todd commented 9 months ago

Is that actually useful though? A TypedDict is always exactly a dict, never a subclass.

Yeah I'm inclined to agree that maybe it's not that helpful a subcase, and probably not worth making more effort for people - I just wanted to point out that theoretically there exists a scenario with a valid interpretation.

Personally, I think it works best to just say TypedDict can only intersect with TypedDict - it's a simple rule, easy to implement, and apart from the above weird edge case seems valid to me.

mikeshardmind commented 9 months ago

I'm inclined to think that there's valuable behavior we're excluding here, but there's some things I need to flesh out more and bring into this other discussion: https://discuss.python.org/t/treat-typeddict-as-structured-types-w-r-t-subclassing/43582/9

The TLDR here is that TypedDicts are a structural type similar to protocols. Anything from an ORM to a API wrapper dealing with json web apis might have a reason to return objects that behave like a typed dict with a specific schema most of the time, but have additional methods on them.

I'll circle back to this once I've examined the implications of it more, if there's an inconsistency that ends up inappropriate without more machinery, we can decide to keep it that limited, this might be a situation where TypedDict needs to change or have additional things added to be usable for such cases, or it might just work out of the box.

vergenzt commented 9 months ago

Are (edit:) instances of subclasses of dict assignable to variables whose type is a subclass of TypedDict?

mikeshardmind commented 9 months ago

Are subclasses of dict assignable to variables whose type is a subclass of TypedDict?

No. Thanks for pointing that out!

I'm going to go ahead and note down why only intersections with typeddicts are allowed, but that it isn't a limitation of intersections while retaining a note that it would be possible to change typeddict in the future to allow more

Not our problem, as the definition we're working towards requires consistency with all operands at assignment, so the limitations of typeddict will prevent other cases for now.

vergenzt commented 9 months ago

I'm going to go ahead and note down why only intersections with typeddicts are allowed, but that it isn't a limitation of intersections while retaining a note that it would be possible to change typeddict in the future to allow more

I like this plan. 👍

In my mind, I would expect the following to be true:

if types A and B have disjoint hierarchies, then a variable of type OrderedIntersection[A, B] should ...
have a value assignable to it if and only if that value: a) is assignable to a variable of type A and b) is assignable to a variable of type B.

(I have put zero thought into whether applies when types A and B have overlapping hierarchies. Out of scope for this ticket IMO.)

If non-pure-dicts are never allowed to to be assigned to TypedDict variables, then non-TypedDict types should never be "OrderedIntersect-able" with TypedDicts.

If the above policy on assignability to TypedDict variables were to change in the future, I would expect the TypedDict "OrderedIntersect-ability" policy to change with it.

CarliJoy commented 9 months ago

I really oppose the idea of limiting Intersection with TypedDicts only with TypedDicts.

This really would decrease the value of OrderedIntersection for me. Python is still a dynamical language. Even so I might not be able to create a proper typed dict that is a subclass of something else I might want to dynamically create this and want the typing system to know how to handle this i.e. by using a cast. So I actually don't care much about a "real" mix here. I will have a creator using a cast and the variables will only have exactly this Intersection type, nothing else. The type checker can still prevent me from assign this object to anything that isn't

For instance just today we had to case that some general settings are provided as TypedDict (to allow defining the values) but it should be ReadOnly for all attributes. While PEP 705 could solve that maybe, I still would like to be able to "overwrite" the __setitem__ Method with something with Never in order to tell the type checker that settings items is not allowed.

In reality the object itself would be a MappingProxy which would lead to runtime failures anyway.

Another use case is pandas or numpy.ndarray which have a __getitem__ methods that relay to a array with defined types. I want to be able to write a TypeGuard that return a OrderedIntersection of this object with a TypedDict in order to access the objects[item] resulting in the correct type. Doing that any other way mean casting / type ignore in my code everywhere. The OrderedIntersection is the only thing that could help in this case.

Maybe some of you will argue, that these classes provide __getattribute__ overwrites as well. But this only works if item is a valid python identifier name, which often it isn't.

mikeshardmind commented 9 months ago

@CarliJoy I have found zero type system features that need explicit banning from the definition we're working towards for intersections at this time, and largely agree with you. The ORM case I was considering is quite akin to pandas dataframes. I do think this will require changes to the spec of TypedDict to be properly supported. type checkers currently do not synthesize __getitem__ and other method overloads for TypedDicts. but this is not, and should not be viewed as a limitation we need to include as a part of intersection as it will naturally not work as-is, but work properly should that spec be updated to accommodate that use.

mark-todd commented 9 months ago

While PEP 705 could solve that maybe, I still would like to be able to "overwrite" the __setitem__ Method with something with Never in order to tell the type checker that settings items is not allowed.

Another use case is pandas or numpy.ndarray which have a __getitem__ methods that relay to a array with defined types.

@CarliJoy This is interesting!

Isn't this:

I still would like to be able to "overwrite" the __setitem__ Method with something with Never

...in violation with LSP? @mikeshardmind

editing note: I've modified some part of this post as there were some points I hadn't thought of - see the one below.

mark-todd commented 9 months ago

Another use case is pandas or numpy.ndarray which have a getitem methods that relay to a array with defined types. I want to be able to write a TypeGuard that return a OrderedIntersection of this object with a TypedDict in order to access the objects[item] resulting in the correct type.

@CarliJoy I think this part of your argument is very persuasive for me, so I'll address that here - in this situation really TypedDict is used as a way to type individual key types for __getitem__. I'm not entirely sure how I feel about it, in the sense that maybe this should be the role of another typing construct, but seeing as TypedDict is really the only solution available for this I guess it makes sense. Consider the below:

class Test:
    def foo(self) -> int:
        return 1

    def __getitem__(self, key: str):
        if key == 'one':
            return "test"
        elif key == "two":
            return 2
        elif key == "three":
            return None
        else:
            raise KeyError

Now as things are currently there's no way to get type hints to differentiate between keys, while also allowing the method foo, but if we allow intersection:

from types import NoneType
from typing import TypedDict, OrderedIntersection, cast

class Test:
    def foo(self) -> int:
        return 1

    def __getitem__(self, key: str):
        if key == 'one':
            return "test"
        elif key == "two":
            return 2
        elif key == "three":
            return None
        else:
            raise KeyError

class TestD(TypedDict):
    one: str
    two: int
    three: NoneType

x = cast(OrderedIntersection[Test, TestD], Test())

So far so good. Although, I guess this still feels like a bit of a workaround - really the problem here is not that the result is an intersection between the TypedDict and Test, but that __getitem__ can't be given structured types outside of TypedDict. Consider instead, if a different syntax existed:

from types import NoneType
from typing import TypedDict, Unpack

class TestD(TypedDict):
    one: str
    two: int
    three: NoneType

class Test:
    def foo(self) -> int:
        return 1

    def __getitem__(self, key: str) -> Unpack[TestD]:
        if key == 'one':
            return "test"
        elif key == "two":
            return 2
        elif key == "three":
            return None
        else:
            raise KeyError

x = Test()

My point is that I guess that it feels like we're patching a different issue with an intersection, but also I do appreciate this is an issue you want a solution for. It seems to me it's just that the return type of __getitem__ really needs a more descriptive type system.

Edit: I've also just remembered that the syntax for Unpack already combines with TypedDict in the context of kwargs - maybe this would be actually quite a small PEP to suggest the syntax above. https://docs.python.org/3/library/typing.html#typing.Unpack

mikeshardmind commented 9 months ago

@mark-todd

I still would like to be able to "overwrite" the setitem Method with something with Never

...in violation with LSP?

There are open questions about that still. and the actual allowed behavior of Never currently still varies between type checkers. I don't know that we need to solve this right this moment now that we're using an ordered construct, and it won't help replace PEP 705 read only for other reasons

(I believe that the closest to consensus there was was that Never may or may not be consistent with other annotations, and is not with any types or values, being another thing that benefits from better terminology)

@CarliJoy https://github.com/CarliJoy/intersection_examples/issues/43#issuecomment-1937903042 goes over why intersections can't be a full replacement for PEP 705 readonly even with an interpretation that Never ends up compatible with everything in an annotation context, the rules around consistency will let you go from a context where you have the intersection to a context where you only have the operand of the intersection that has a valid setitem dunder.

mark-todd commented 9 months ago

There are https://github.com/python/typing/issues/1458 and the actual allowed behavior of Never currently still varies between type checkers. I don't know that we need to solve this right this moment now that we're using an ordered construct, and it won't help replace PEP 705 read only for other reasons

Yeah I agree, if this is currently an unresolved discussion anyway we should try and steer clear of it.