Stewori / pytypes

Typing-toolbox for Python 3 _and_ 2.7 w.r.t. PEP 484.
Apache License 2.0
200 stars 20 forks source link

Is there a method to list all subclasses of a type, taking into account both parametrized, half-parametrized and not-parametrized generics ? #31

Open smarie opened 6 years ago

smarie commented 6 years ago

Ideally such a method would provide parameters to specify if the caller wishes to

Stewori commented 6 years ago

Given that subclasses can be created any time, how could such a method query the set of possible subclasses? Would it query the subclasses currently defined in the runtime? Or in a specific module? How would it access this pool of subclasses? I think this would be difficult to implement, if feasible at all. What would be the use case?

smarie commented 6 years ago

Each class in python already has a magic method __subclasses__() able to list all the known subclasses (even the ones created dynamically). So it is definitely possible, the issue is that this method does not allow the above extended uses, it is either non-parametrized vs. non-parametrized or parametrized vs parametrized.

I would use this in the parsyfiles declarative parser: if the user uses a type hint that is a generic class, half- or not-parametrized, I should be able to list all the types that are compliant with that expectation so that I try to parse and convert the file(s) into these types.

Stewori commented 6 years ago

Can you post some imaginary code examples how the method would behave, e.g. what result it should give based on which call. Ideally in a form that can easily be turned into a test for the method. I am confident that we can add this method, maybe as a joined effort.

smarie commented 6 years ago
from typing import TypeVar, Generic

T = TypeVar('T')
U = TypeVar('U')

class FullUnparam(Generic[T, U]):
    pass

class FullUnparam2(FullUnparam):
    pass

class HalfParam(FullUnparam[T, int]):
    pass

class EntirelyParam(FullUnparam[str, int]):
    pass

class EntirelyParam2(HalfParam[str]):
    pass

# This works with FullUnparam.__subclasses__() today although intermediate classes appear too
assert get_subclasses(FullUnparam) == [FullUnparam2, HalfParam, EntirelyParam, EntirelyParam2]

# This does not work with FullUnparam.__subclasses__() today. Maybe a bug of stdlib ?
assert get_subclasses(FullUnparam[str, int]) == [EntirelyParam, EntirelyParam2]

# This does not work with HalfParam.__subclasses__() today.
assert get_subclasses(HalfParam) == [EntirelyParam2]

# variant 1:  only Generic subclasses
assert get_subclasses(FullUnparam, only_generics=True) == [FullUnparam2, HalfParam]
assert get_subclasses(HalfParam, only_generics=True) == []

# variant 2: only Generic subclasses with same number of free parameters
assert get_subclasses(FullUnparam, only_generics=True, parametrized=False) == [FullUnparam2]

I hope this is clear enough, let me know if this requires some more explanation. By the way I am not very good at naming parameters so feel free to change these suggestions :)

Stewori commented 6 years ago

Here is a first draft:

def get_subclasses(cls):
    orig = pytypes.type_util._find_base_with_origin(cls, object)
    res = cls.__subclasses__()
    if not orig is None and hasattr(orig, "__origin__") and not orig.__origin__ is None:
        candidates = orig.__origin__.__subclasses__()
        for candidate in candidates:
            if candidate != cls and is_subtype(candidate, cls):
                res.append(candidate)
    return res

It does not yet consider additional flags like only_generics, parametrized. It does not get assert get_subclasses(HalfParam) == [EntirelyParam] "right", but maybe your example has a flaw? is_subtype(HalfParam, EntirelyParam) #False Thoughts?

smarie commented 6 years ago

Thanks for the fast answer! Yes you're right, my example had a flaw, I fixed it directly in the initial post: I added another class EntirelyParam2 and updated the tests.

The code you provided is almost functional, except that

smarie commented 6 years ago

Note that strictly speaking, EntirelyParam2 is not a subclass of FullUnparam (and even not of HalfParam, even that appears correctly in the results) because it is not a subclass of Generic anymore. However from a practical point of view, it is definitely 'compliant with' both.

So we should maybe think of a better name for this method (get_subtypes, get_compliant_subtypes...) so that this does not add confusion.

smarie commented 6 years ago

I opened a mirror issue in typing_inspect so that @ilevkivskyi may check if he feels that this should rather go in the low-level api

smarie commented 6 years ago

My attempt, recursing on subclasses because it sems that __subclasses__() is not recursive:

def get_all_subclasses(typ, recursive: bool = True, memo = None) -> List[Type[Any]]:
    """
    Returns all subclasses, and supports generic types. It is recursive by default

    :param typ:
    :return:
    """
    memo = memo or set()

    # if we have collected the subclasses for this already, return
    if typ in memo:
        return []

    # else remember that we have collected them, and collect them
    memo.add(typ)
    if is_generic_type(typ):
        sub_list = get_origin(typ).__subclasses__()
    else:
        sub_list = typ.__subclasses__()

    # recurse
    if recursive:
        for typpp in sub_list:
            for t in get_all_subclasses(typpp, recursive=True, memo=memo):
                # unfortunately we have to check 't not in sub_list' because with generics strange things happen
                # maybe is_subtype is not the way to go, find a replacement meaning 'is_compliant' ?
                if t not in sub_list and is_subtype(t, typ):
                    sub_list.append(t)

    return sub_list

It also does not find EntirelyParam2 in the list of subclasses for FullUnparam.

Nailing it down, this comes from that fact that is_subtype(EntirelyParam2, FullUnparam[T, int]) returns False. This does not seem correct, or if it is, what is the function in pytypes that would return True ?

Stewori commented 6 years ago

By default, pytypes treats an unassigned typevar T as unknown in the sense that nothing is a subtype. (unkown is however subtype of Any). By providing a writable dict for bound_typevars you allow pytypes to bind T as needed. (I recently enhanced doc of is_subtype where this is now explained) You can afterwards see how pytypes has bound T:

bt = {}
print is_subtype(EntirelyParam2, FullUnparam[T, int], bound_typevars=bt) # True
print bt # {~T: <type 'str'>}

I didn't apply this in my draft of get_subclasses, because it would have resulted in additional subclasses to be added, contrary to the examples you denoted.

Stewori commented 6 years ago

I noticed that with your corrected example, FullUnparam.__subclasses__() yields [FullUnparam2, FullUnparam[~T, int], HalfParam, FullUnparam[str, int], EntirelyParam, HalfParam[str]]. Is that intended? Or should it be filtered somehow?

smarie commented 6 years ago

Thanks for the help with is_subtype! The following method passes all core tests:

def get_all_subclasses(typ, recursive: bool = True, memo = None) -> List[Type[Any]]:
    memo = memo or set()

    # if we have collected the subclasses for this already, return
    if typ in memo:
        return []

    # else remember that we have collected them, and collect them
    memo.add(typ)
    if is_generic_type(typ):
        # We now use get_origin() to also find all the concrete subclasses in case the desired type is a generic
        sub_list = get_origin(typ).__subclasses__()
    else:
        sub_list = typ.__subclasses__()

    # recurse
    result = [t for t in sub_list if t is not typ and is_subtype(t, typ, bound_typevars={})]
    if recursive:
        for typpp in sub_list:
            for t in get_all_subclasses(typpp, recursive=True, memo=memo):
                # unfortunately we have to check 't not in sub_list' because with generics strange things happen
                # also is_subtype returns false when the parent is a generic
                if t not in sub_list and is_subtype(t, typ, bound_typevars={}):
                    result.append(t)

    return result

I let you decide whether to include it, with variants or not, in pytypes. In the meantime I will use this copy.

Concerning your last comment, that's what I call the 'intermediate classes'. The only reason why FullUnparam[~T, int], FullUnparam[str, int] and HalfParam[str] appear in the list is that these types were used to construct our classes. But many other variants virtually exist (with all possible replacements of the typevars!), and they do not appear because they were not used. So for the sake of consistency, we should probably try to remove these from the list by default. We might wish to provide an option to make all of the variants appear (using all possible combinations of allowed subclasses for the typevars). That list might become big... Even if I do not have any use case for this at the moment, it would complete the picture.

Stewori commented 6 years ago

We might wish to provide an option to make all of the variants appear

This can be not just big, but infinite due to nesting. Also, remember that __subclasses__ only returns subclasses that actually exist at current runtime and not every possible subclass. In this fashion, adding all variants is not required (and anyway not feasible I suppose).

Filtering them out would mean to only include origins in the result?

Are is_generic_type and get_origin from typing inspect? I think these are not pytypes functions, but I think pytypes has internal equivalents. I would include a version using pytypes' existing internals.

The type annotations need to be in Python 2.7 style to keep the code cross-platform.

Apart from that List[Type[Any]] is AFAIK not the correct result type, because List is invariant. I think you meant it in covariant sense. Sequence[Type[Any]] should do the trick.

Stewori commented 6 years ago

Note that pytypes cannot include typing_inspect as dependency, because typing_inspect lacks support for various versions of typing and Python.

Stewori commented 6 years ago

Already figured it out. Yes I would add that function. Could you provide updated tests? I.e. in the tests you posted above, how would the function deal with the keywords only_generics, parametrized?

smarie commented 6 years ago

Thanks !

By the way, do you know why is_subtype(Dict, Dict[str, int], bound_typevars={}) returns an error ?

Stewori commented 6 years ago

Sorry, I cannot reproduce is_subtype(Dict, Dict[str, int], bound_typevars={}) returning an error. It returns False for me, as expected. Dict parameters are not covariant.

concerning the tests, I do not understand your question: I think that the tests in the original post are ok

E.g. assert get_subclasses(FullUnparam, only_generics=True, parametrized=False) == [FullUnparam2] involves keywords only_generics and parametrized which are not defined by your function get_all_subclasses. How to deal with them?

Stewori commented 6 years ago

To get this finalized, I just wanted to add the method in its last version. In turned out that this approach seems to fail with typing-3.5.3.0 and earlier. Ideas?

Stewori commented 6 years ago

It seems like behavior of __subclasses__() changes heavily between typing versions: FullUnparam.__subclasses__() (I removed <....> parameters from printout for better comparability):

typing <3.5.3.0: [FullUnparam2, FullUnparam[~T, int], FullUnparam[str, int]] typing 3.5.3.0: [FullUnparam2, HalfParam, EntirelyParam, HalfParam[str]] typing 3.6.1: [FullUnparam2, FullUnparam[~T, int], HalfParam, FullUnparam[str, int], EntirelyParam, HalfParam[str]]

Not yet even tried it on Python 3.7. So, this looks too difficult to maintain. Suggestions/ideas how to normalize behavior of __subclasses__() across versions?

smarie commented 6 years ago

Argh, that explains a lot of things, thanks for investigating and finding this out!

Here is a synthetic view of the relations between the classes (I hope that the display is correct on all browsers/OSes). User-defined classes are wrapped with stars.

Generic[T, U]         Generic[T]
       ↑                  ↑
*FullUnparam*    ←   FullUnparam[T, int]  ←  FullUnparam[str, int]
       ↑                  ↑                       ↑            ↖ 
*FullUnparam2*       *HalfParam*         ←    HalfParam[str]       *EntirelyParam*
                                                  ↑
                                            *EntirelyParam2*

So, performing the recursion as we defined above seems to always retrieve the user-defined classes (FU2, HP, EP, EP2) whatever the version of python, correct ? What varies is the presence or not of the other ones (the ones with brackets, that exist only because they are used behind the scenes).

If that's correct, adding a post-processing filter to try to identify and remove them (is that only possible?) would be the ideal solution. Otherwise we can simply document the function to explain that there is a guarantee on user-defined subclasses but not on others.

Stewori commented 6 years ago

It's somehow part of pytypes' goal to smoothen the rough edges between typing versions, to normalize uneven behavior. In that fashion, a normalizing solution would be the best I guess. What you say sounds reasonable. If you can turn this into a pull request that yields a consistent result in all these cses, I'll accept it. So (FU2, HP, EP, EP2) should be the canonical result? Maybe you can use pytypes.get_Generic_parameters(tp, generic_supertype) to filter the undesired classes, i.e. remove them if that function returns an actual value.... I suppose generic_supertype can then be set to the original type typ on that get_all_subclasses was called. Or what about hasattr(tp, '__parameterss__') or hasattr(tp, '__args__')? I know, behaviour of these also varies between typing versions. Maybe we'll need to assemble a new utility function for this...

Stewori commented 6 years ago

Maybe pytypes.get_Generic_parameters(tp, generic_supertype=tp) would do the trick, i.e. checking that it returns empty list or none or errors (don't remember exact behavior right now).