coady / multimethod

Multiple argument dispatching.
https://coady.github.io/multimethod
Other
284 stars 23 forks source link

`@multimethod` calls `__iter__` method #46

Closed geometrian closed 3 years ago

geometrian commented 3 years ago

The @multimethod decorator, at least, seems to call the __iter__ method, if the class has one. It only seems to happen if one of the arguments is iterable (like list or a tuple), but TBH I'm not sure precisely what's going on. Here's a semi-minimal repro (tested Python 3.9, multimethod 1.5):

from multimethod import multimethod

class Foo(object):
    @multimethod
    def __init__( self, arg:int ):
        print("Constructor 1")
    @multimethod
    def __init__( self, arg:list[int] ):
        print("Constructor 2")

    def __iter__(self):
        print("Why is this called?")
        #Returns `None`, causing crash inside multimethod . . .

foo = Foo(0)

Neither constructor runs. Instead, the __iter__ runs, and that method probably can't even do anything sensible since the class hasn't even been constructed yet.

I'm also interested in workarounds.

coady commented 3 years ago

Yeah, unfortunately that's by design, but maybe there's a workaround.

What's happening is the presence of list[int] causes iterable args to check their first member. And the Foo instance does exist at that point (after __new__), but isn't initialized yet. Is that why __iter__ is raising an error?

Workarounds for Foo:

Fixes for multimethod:

They all seem a little hacky to me at first glance. The root problem here is dispatching on a dynamically typed language; there's no "right" way to determine an arg is-a list[int]. Only checking the first member is already cheating.

geometrian commented 3 years ago

So . . . since this was a blocking issue for me (in this and in three other overload decorator packages . . .), I went ahead and wrote my own decorator from scratch. Obviously, it's far less tested than yours and it was a huge pain to write, but it does seem to work everywhere.

In the process of doing this, I think I encountered a similar problem as you did. I'm still not sure I quite understand quite where/why the problem occurs in multimethod, but I think I see the problem it breaks on: namely, if one of the argument types is of type types.GenericAlias (e.g. list[int]), there's no easy way to tell whether the type of an argument matches it (e.g. an argument [0,1,2] of type list doesn't match it).


I think the right answer here is not to try to inspect the argument before you know it's actually a list.

My solution is, if the required argument is of type types.GenericAlias, to simply check that the argument satisfies it using the types themselves. E.g., for an argument that's supposed to be a list[int], it checks that it's (1) a list. Only the argument actually is a list does it then (2) check that each of its elements is an int.

I'm maybe not explaining it well, but it should be pretty obvious. And, in case it's not, here's some actual code:

def generic_match( tgen:types.GenericAlias, targ:type,arg:typing.Any ) -> bool:
    assert type(arg) == targ

    if targ != tgen.__origin__:
        return False

    if   tgen.__origin__ == tuple: #note: I didn't test this branch yet
        return tuple(map(type,arg)) == tgen.__args__
    elif tgen.__origin__ == list :
        assert len(tgen.__args__) == 1
        telems:type = tgen.__args__[0]

        for elem in arg:
            if not issubclass( type(elem), telems ): return False
        return True
    else:
        assert False #Not implemented

Note this is incomplete; I think there probably would have to be some recursive testing here too.

Of course, the immediate downside of this approach is that this can have extremely bad performance (e.g. try doing a thousand overloaded calls, where each time you have to check a list with a million elements . . .).

There are various ways around that—for example, just checking that the first element, if there is one, is the right type and hoping for the best on the rest. The tradeoff is that you can get false matches. Or, you can just check that the generic type matches and don't check any of the contents. This is simplest and fastest, but in addition to giving false matches, you lose all ability to disambiguate subtypes of the generic. For example, my decorator can disambiguate all of these overloads:

def foo( arg:int         ): #...
def foo( arg:list[int  ] ): #...
def foo( arg:list[float] ): #...

. . . which I think is pretty neat. Note the different types on the list.


Sorry for rambling. Hopefully this gives some ideas. Maybe if I understood the Python typesystem a bit better I could come up with something clearer. Also, I can probably work up a gist or something if this is unclear (but I'm not going to do so pre-emptively since I think it's fairly straightforward and you've suffered enough of my code already :) ).

coady commented 3 years ago

65b0266 has a fix for your example. It only uses the recursive type checker on params that use subscripts.

I think the right answer here is not to try to inspect the argument before you know it's actually a list.

The catch is making that performant. The dispatch works in 2 phases::

  1. determine the types of the arguments - uncached
  2. compare the types using issubclass - cached by type

So if the argument is Foo(), step 1 has to know in advance whether matching Iterable[...] is a possibility in step 2. If so step 1 must resolve the arg to Foo[...], otherwise it could stop at just Foo.

geometrian commented 3 years ago

(Thanks for looking into the fix. I confirm this fixes at least this use-case—at least enough to re-enable this as an option in my code.)

I indeed ran into performance issues with my implementation. However, I found that adding a small cache basically fixes it.

My key for caching works somewhat differently though, I think: I just hash the types of the arguments (expanding lists and tuples, as necessary; maybe could expand other types, e.g. anything that's iterable?). Also, the value in the cache is the original wrapped function.

I'm not sure whether hashing the types and doing a table lookup vs. checking them against cached types is faster—though if I had to guess, the former has the edge. For the record, since the patch fixes the issue, I can compare the implementations; mine is about 5% faster. This does not necessarily mean the caching method is better fundamentally; just that there's room to be gained.

coady commented 3 years ago

fc5b004 added a type checker that is adaptive to the registered parameters. It should do minimal iterating on arguments, while still supporting Unions, nested subscripts, etc.