Closed geometrian closed 3 years ago
Yeah, unfortunately that's by design, but maybe there's a workaround.
What's happening is the presence of list[int]
causes iterable args to check their first member. And the Foo
instance does exist at that point (after __new__
), but isn't initialized yet. Is that why __iter__
is raising an error?
Workarounds for Foo
:
list
without a subscript__iter__
to return without error if uninitializedFixes for multimethod:
TypeError
on calling iter
; detecting whether an object is iterable has never been perfectIterable
to Collection
; wouldn't help if the class is also a collection__init__
arg
would be iterated, but not self
They all seem a little hacky to me at first glance. The root problem here is dispatching on a dynamically typed language; there's no "right" way to determine an arg is-a list[int]
. Only checking the first member is already cheating.
So . . . since this was a blocking issue for me (in this and in three other overload decorator packages . . .), I went ahead and wrote my own decorator from scratch. Obviously, it's far less tested than yours and it was a huge pain to write, but it does seem to work everywhere.
In the process of doing this, I think I encountered a similar problem as you did. I'm still not sure I quite understand quite where/why the problem occurs in multimethod
, but I think I see the problem it breaks on: namely, if one of the argument types is of type types.GenericAlias
(e.g. list[int]
), there's no easy way to tell whether the type of an argument matches it (e.g. an argument [0,1,2]
of type list
doesn't match it).
I think the right answer here is not to try to inspect the argument before you know it's actually a list
.
My solution is, if the required argument is of type types.GenericAlias
, to simply check that the argument satisfies it using the types themselves. E.g., for an argument that's supposed to be a list[int]
, it checks that it's (1) a list
. Only the argument actually is a list
does it then (2) check that each of its elements is an int
.
I'm maybe not explaining it well, but it should be pretty obvious. And, in case it's not, here's some actual code:
def generic_match( tgen:types.GenericAlias, targ:type,arg:typing.Any ) -> bool:
assert type(arg) == targ
if targ != tgen.__origin__:
return False
if tgen.__origin__ == tuple: #note: I didn't test this branch yet
return tuple(map(type,arg)) == tgen.__args__
elif tgen.__origin__ == list :
assert len(tgen.__args__) == 1
telems:type = tgen.__args__[0]
for elem in arg:
if not issubclass( type(elem), telems ): return False
return True
else:
assert False #Not implemented
Note this is incomplete; I think there probably would have to be some recursive testing here too.
Of course, the immediate downside of this approach is that this can have extremely bad performance (e.g. try doing a thousand overloaded calls, where each time you have to check a list with a million elements . . .).
There are various ways around that—for example, just checking that the first element, if there is one, is the right type and hoping for the best on the rest. The tradeoff is that you can get false matches. Or, you can just check that the generic type matches and don't check any of the contents. This is simplest and fastest, but in addition to giving false matches, you lose all ability to disambiguate subtypes of the generic. For example, my decorator can disambiguate all of these overloads:
def foo( arg:int ): #...
def foo( arg:list[int ] ): #...
def foo( arg:list[float] ): #...
. . . which I think is pretty neat. Note the different types on the list
.
Sorry for rambling. Hopefully this gives some ideas. Maybe if I understood the Python typesystem a bit better I could come up with something clearer. Also, I can probably work up a gist or something if this is unclear (but I'm not going to do so pre-emptively since I think it's fairly straightforward and you've suffered enough of my code already :) ).
65b0266 has a fix for your example. It only uses the recursive type checker on params that use subscripts.
I think the right answer here is not to try to inspect the argument before you know it's actually a list.
The catch is making that performant. The dispatch works in 2 phases::
issubclass
- cached by typeSo if the argument is Foo()
, step 1 has to know in advance whether matching Iterable[...]
is a possibility in step 2. If so step 1 must resolve the arg to Foo[...]
, otherwise it could stop at just Foo
.
(Thanks for looking into the fix. I confirm this fixes at least this use-case—at least enough to re-enable this as an option in my code.)
I indeed ran into performance issues with my implementation. However, I found that adding a small cache basically fixes it.
My key for caching works somewhat differently though, I think: I just hash the types of the arguments (expanding list
s and tuple
s, as necessary; maybe could expand other types, e.g. anything that's iterable?). Also, the value in the cache is the original wrapped function.
I'm not sure whether hashing the types and doing a table lookup vs. checking them against cached types is faster—though if I had to guess, the former has the edge. For the record, since the patch fixes the issue, I can compare the implementations; mine is about 5% faster. This does not necessarily mean the caching method is better fundamentally; just that there's room to be gained.
fc5b004 added a type checker that is adaptive to the registered parameters. It should do minimal iterating on arguments, while still supporting Unions, nested subscripts, etc.
The
@multimethod
decorator, at least, seems to call the__iter__
method, if the class has one. It only seems to happen if one of the arguments is iterable (likelist
or atuple
), but TBH I'm not sure precisely what's going on. Here's a semi-minimal repro (tested Python 3.9, multimethod 1.5):Neither constructor runs. Instead, the
__iter__
runs, and that method probably can't even do anything sensible since the class hasn't even been constructed yet.I'm also interested in workarounds.