Open smarie opened 6 years ago
Given that subclasses can be created any time, how could such a method query the set of possible subclasses? Would it query the subclasses currently defined in the runtime? Or in a specific module? How would it access this pool of subclasses? I think this would be difficult to implement, if feasible at all. What would be the use case?
Each class in python already has a magic method __subclasses__()
able to list all the known subclasses (even the ones created dynamically). So it is definitely possible, the issue is that this method does not allow the above extended uses, it is either non-parametrized vs. non-parametrized or parametrized vs parametrized.
I would use this in the parsyfiles declarative parser: if the user uses a type hint that is a generic class, half- or not-parametrized, I should be able to list all the types that are compliant with that expectation so that I try to parse and convert the file(s) into these types.
Can you post some imaginary code examples how the method would behave, e.g. what result it should give based on which call. Ideally in a form that can easily be turned into a test for the method. I am confident that we can add this method, maybe as a joined effort.
from typing import TypeVar, Generic
T = TypeVar('T')
U = TypeVar('U')
class FullUnparam(Generic[T, U]):
pass
class FullUnparam2(FullUnparam):
pass
class HalfParam(FullUnparam[T, int]):
pass
class EntirelyParam(FullUnparam[str, int]):
pass
class EntirelyParam2(HalfParam[str]):
pass
# This works with FullUnparam.__subclasses__() today although intermediate classes appear too
assert get_subclasses(FullUnparam) == [FullUnparam2, HalfParam, EntirelyParam, EntirelyParam2]
# This does not work with FullUnparam.__subclasses__() today. Maybe a bug of stdlib ?
assert get_subclasses(FullUnparam[str, int]) == [EntirelyParam, EntirelyParam2]
# This does not work with HalfParam.__subclasses__() today.
assert get_subclasses(HalfParam) == [EntirelyParam2]
# variant 1: only Generic subclasses
assert get_subclasses(FullUnparam, only_generics=True) == [FullUnparam2, HalfParam]
assert get_subclasses(HalfParam, only_generics=True) == []
# variant 2: only Generic subclasses with same number of free parameters
assert get_subclasses(FullUnparam, only_generics=True, parametrized=False) == [FullUnparam2]
I hope this is clear enough, let me know if this requires some more explanation. By the way I am not very good at naming parameters so feel free to change these suggestions :)
Here is a first draft:
def get_subclasses(cls):
orig = pytypes.type_util._find_base_with_origin(cls, object)
res = cls.__subclasses__()
if not orig is None and hasattr(orig, "__origin__") and not orig.__origin__ is None:
candidates = orig.__origin__.__subclasses__()
for candidate in candidates:
if candidate != cls and is_subtype(candidate, cls):
res.append(candidate)
return res
It does not yet consider additional flags like only_generics
, parametrized
. It does not get assert get_subclasses(HalfParam) == [EntirelyParam]
"right", but maybe your example has a flaw? is_subtype(HalfParam, EntirelyParam) #False
Thoughts?
Thanks for the fast answer! Yes you're right, my example had a flaw, I fixed it directly in the initial post: I added another class EntirelyParam2
and updated the tests.
The code you provided is almost functional, except that
get_subclasses(FullUnparam)
does not return EntirelyParam2
FullUnparam[T, int]
is considered a class. Maybe there is nothing to do here, or maybe an option to add in order to filter these out if required ?get_subclasses(HalfParam)
returns HalfParam[str]
twice in the output list.Note that strictly speaking, EntirelyParam2
is not a subclass of FullUnparam
(and even not of HalfParam
, even that appears correctly in the results) because it is not a subclass of Generic
anymore. However from a practical point of view, it is definitely 'compliant with' both.
So we should maybe think of a better name for this method (get_subtypes
, get_compliant_subtypes
...) so that this does not add confusion.
I opened a mirror issue in typing_inspect
so that @ilevkivskyi may check if he feels that this should rather go in the low-level api
My attempt, recursing on subclasses because it sems that __subclasses__()
is not recursive:
def get_all_subclasses(typ, recursive: bool = True, memo = None) -> List[Type[Any]]:
"""
Returns all subclasses, and supports generic types. It is recursive by default
:param typ:
:return:
"""
memo = memo or set()
# if we have collected the subclasses for this already, return
if typ in memo:
return []
# else remember that we have collected them, and collect them
memo.add(typ)
if is_generic_type(typ):
sub_list = get_origin(typ).__subclasses__()
else:
sub_list = typ.__subclasses__()
# recurse
if recursive:
for typpp in sub_list:
for t in get_all_subclasses(typpp, recursive=True, memo=memo):
# unfortunately we have to check 't not in sub_list' because with generics strange things happen
# maybe is_subtype is not the way to go, find a replacement meaning 'is_compliant' ?
if t not in sub_list and is_subtype(t, typ):
sub_list.append(t)
return sub_list
It also does not find EntirelyParam2
in the list of subclasses for FullUnparam
.
Nailing it down, this comes from that fact that is_subtype(EntirelyParam2, FullUnparam[T, int])
returns False
. This does not seem correct, or if it is, what is the function in pytypes that would return True
?
By default, pytypes treats an unassigned typevar T
as unknown in the sense that nothing is a subtype. (unkown is however subtype of Any
). By providing a writable dict for bound_typevars
you allow pytypes to bind T
as needed. (I recently enhanced doc of is_subtype
where this is now explained)
You can afterwards see how pytypes has bound T
:
bt = {}
print is_subtype(EntirelyParam2, FullUnparam[T, int], bound_typevars=bt) # True
print bt # {~T: <type 'str'>}
I didn't apply this in my draft of get_subclasses
, because it would have resulted in additional subclasses to be added, contrary to the examples you denoted.
I noticed that with your corrected example, FullUnparam.__subclasses__()
yields [FullUnparam2, FullUnparam[~T, int], HalfParam, FullUnparam[str, int], EntirelyParam, HalfParam[str]]
. Is that intended? Or should it be filtered somehow?
Thanks for the help with is_subtype! The following method passes all core tests:
def get_all_subclasses(typ, recursive: bool = True, memo = None) -> List[Type[Any]]:
memo = memo or set()
# if we have collected the subclasses for this already, return
if typ in memo:
return []
# else remember that we have collected them, and collect them
memo.add(typ)
if is_generic_type(typ):
# We now use get_origin() to also find all the concrete subclasses in case the desired type is a generic
sub_list = get_origin(typ).__subclasses__()
else:
sub_list = typ.__subclasses__()
# recurse
result = [t for t in sub_list if t is not typ and is_subtype(t, typ, bound_typevars={})]
if recursive:
for typpp in sub_list:
for t in get_all_subclasses(typpp, recursive=True, memo=memo):
# unfortunately we have to check 't not in sub_list' because with generics strange things happen
# also is_subtype returns false when the parent is a generic
if t not in sub_list and is_subtype(t, typ, bound_typevars={}):
result.append(t)
return result
I let you decide whether to include it, with variants or not, in pytypes. In the meantime I will use this copy.
Concerning your last comment, that's what I call the 'intermediate classes'. The only reason why FullUnparam[~T, int]
, FullUnparam[str, int]
and HalfParam[str]
appear in the list is that these types were used to construct our classes. But many other variants virtually exist (with all possible replacements of the typevars!), and they do not appear because they were not used. So for the sake of consistency, we should probably try to remove these from the list by default. We might wish to provide an option to make all of the variants appear (using all possible combinations of allowed subclasses for the typevars). That list might become big... Even if I do not have any use case for this at the moment, it would complete the picture.
We might wish to provide an option to make all of the variants appear
This can be not just big, but infinite due to nesting. Also, remember that __subclasses__
only returns subclasses that actually exist at current runtime and not every possible subclass. In this fashion, adding all variants is not required (and anyway not feasible I suppose).
Filtering them out would mean to only include origins in the result?
Are is_generic_type
and get_origin
from typing inspect? I think these are not pytypes functions, but I think pytypes has internal equivalents. I would include a version using pytypes' existing internals.
The type annotations need to be in Python 2.7 style to keep the code cross-platform.
Apart from that List[Type[Any]]
is AFAIK not the correct result type, because List
is invariant. I think you meant it in covariant sense. Sequence[Type[Any]]
should do the trick.
Note that pytypes cannot include typing_inspect as dependency, because typing_inspect lacks support for various versions of typing
and Python.
Already figured it out. Yes I would add that function. Could you provide updated tests?
I.e. in the tests you posted above, how would the function deal with the keywords only_generics
, parametrized
?
Thanks !
is_generic_type
and get_origin
are from typing_inspect
, as you figured out.By the way, do you know why is_subtype(Dict, Dict[str, int], bound_typevars={})
returns an error ?
Sorry, I cannot reproduce is_subtype(Dict, Dict[str, int], bound_typevars={})
returning an error. It returns False
for me, as expected. Dict parameters are not covariant.
concerning the tests, I do not understand your question: I think that the tests in the original post are ok
E.g.
assert get_subclasses(FullUnparam, only_generics=True, parametrized=False) == [FullUnparam2]
involves keywords only_generics
and parametrized
which are not defined by your function get_all_subclasses
. How to deal with them?
To get this finalized, I just wanted to add the method in its last version. In turned out that this approach seems to fail with typing-3.5.3.0 and earlier. Ideas?
It seems like behavior of __subclasses__()
changes heavily between typing versions:
FullUnparam.__subclasses__()
(I removed <....>
parameters from printout for better comparability):
typing <3.5.3.0:
[FullUnparam2, FullUnparam[~T, int], FullUnparam[str, int]]
typing 3.5.3.0:
[FullUnparam2, HalfParam, EntirelyParam, HalfParam[str]]
typing 3.6.1:
[FullUnparam2, FullUnparam[~T, int], HalfParam, FullUnparam[str, int], EntirelyParam, HalfParam[str]]
Not yet even tried it on Python 3.7. So, this looks too difficult to maintain.
Suggestions/ideas how to normalize behavior of __subclasses__()
across versions?
Argh, that explains a lot of things, thanks for investigating and finding this out!
Here is a synthetic view of the relations between the classes (I hope that the display is correct on all browsers/OSes). User-defined classes are wrapped with stars.
Generic[T, U] Generic[T]
↑ ↑
*FullUnparam* ← FullUnparam[T, int] ← FullUnparam[str, int]
↑ ↑ ↑ ↖
*FullUnparam2* *HalfParam* ← HalfParam[str] *EntirelyParam*
↑
*EntirelyParam2*
So, performing the recursion as we defined above seems to always retrieve the user-defined classes (FU2, HP, EP, EP2) whatever the version of python, correct ? What varies is the presence or not of the other ones (the ones with brackets, that exist only because they are used behind the scenes).
If that's correct, adding a post-processing filter to try to identify and remove them (is that only possible?) would be the ideal solution. Otherwise we can simply document the function to explain that there is a guarantee on user-defined subclasses but not on others.
It's somehow part of pytypes' goal to smoothen the rough edges between typing versions, to normalize uneven behavior. In that fashion, a normalizing solution would be the best I guess.
What you say sounds reasonable. If you can turn this into a pull request that yields a consistent result in all these cses, I'll accept it.
So (FU2, HP, EP, EP2) should be the canonical result? Maybe you can use pytypes.get_Generic_parameters(tp, generic_supertype)
to filter the undesired classes, i.e. remove them if that function returns an actual value.... I suppose generic_supertype
can then be set to the original type typ
on that get_all_subclasses
was called. Or what about hasattr(tp, '__parameterss__')
or hasattr(tp, '__args__')
? I know, behaviour of these also varies between typing versions. Maybe we'll need to assemble a new utility function for this...
Maybe pytypes.get_Generic_parameters(tp, generic_supertype=tp)
would do the trick, i.e. checking that it returns empty list or none or errors (don't remember exact behavior right now).
Ideally such a method would provide parameters to specify if the caller wishes to