Stewori / pytypes

Typing-toolbox for Python 3 _and_ 2.7 w.r.t. PEP 484.
Apache License 2.0
201 stars 20 forks source link

An issue with forward declarations and recursive type #22

Open mitar opened 6 years ago

mitar commented 6 years ago

Example:

import typing

from pytypes import type_util

Container = typing.Union[
    typing.List['Data'],
]

Data = typing.Union[
    Container,
    str, bytes, bool, float, int, dict,
]

type_util._issubclass(typing.List[float], Container)
Traceback (most recent call last):
  File "<redacted>/lib/python3.6/site-packages/pytypes/type_util.py", line 1387, in _issubclass_2
    return issubclass(subclass, superclass)
TypeError: Forward references cannot be used with issubclass().

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 14, in <module>
    type_util._issubclass(typing.List[float], Container)
TypeError: Invalid type declaration: float, _ForwardRef('Data')
Stewori commented 6 years ago

I think currently forward declarations are only supported directly in annotations. Not sure if a mix of types and forward declarations is supported, but I suppose it's not (yet). Currently you can only use things like

def some_function(a: "a_type_defined_later"): ...

Putting the whole type declaration into a string might work, but I suppose forward declarations are currently parsed in get_type_hints or so. I'm not sure when I'll find time to implement this. PRs welcome...

Stewori commented 6 years ago

Quickly looked into the code. The relevant position is at https://github.com/Stewori/pytypes/blob/15ec80c27e6f933b85ebdf4a6885a5c62552a50f/pytypes/type_util.py#L836. There it detects a string eventually and would resolve forward declaration. We could add such a case to _issubclass but then it would resolve the forward declaration again and again for every new subtype check. Or _issubclass is allowed to modify the incoming type. Or we have some kind of cache.

A problem with resolving forward declarations within _issubclass is that _issubclass usually does not know the context, i.e. the module where the declaration is defined. It can maybe guess it from the caller; I suspect it is not possible to get the enclosing module from a type, especially not if it is one pure forward declaration string. If you have an idea how _issubclass should/could retrieve that info, let me know. Or maybe we should add yet another optional arg that names the module to use for resolving forward declarations.

Maybe the best idea would be to have a public service function (e.g. resolve_forward_decl) that resolves forward declarations in a type and also takes the module as explicit argument, returning the resolved type. Then _issubclass simply requires that a type containing forward declarations was previously processed by that function. An appropriate error msg would point the user to this.

Also note that resolving a forward ref, even checking if a type needs to resolve a forward ref somewhere internally is expensive and in most cases not necessary. With an explicit service function the user can actively and transparently resolve types as needed. What do you think?

mitar commented 6 years ago

I think the public service function would be the best. It just has to make sure it resolves recursive types through reusing the same reference instead of copying the type over again and again. But yes, I think this would be the best and could also be done only once/cached on the caller's side.

mitar commented 6 years ago

Any chance on implementing this soon or helping a bit how to do it? I thought I can make a workaround for us but it seems it does not work and it started blocking one validator we have for code and are using this package to check types. :-(

Stewori commented 6 years ago

Sorry, I was rather busy during christmas time. It would be great to have some help on this one. I already spent some thoughts about implementing the resolve function. I'll try to put a draft together we can iterate on...

mitar commented 6 years ago

Yea, I completely understand about holidays. I tried to finish this before them, but it seems I got stuck now. Anyway, don't worry, I will try to see what I can do.

Stewori commented 6 years ago

So, I spontaneously committed some work on this as of https://github.com/Stewori/pytypes/commit/0f454aec2cc2ba92efa7dd3aed2215a2cc986394. This should help with your issue for now. However, it maybe does not yet handle callables correctly. And it might run into recursion issues eventually. The example from above works now, if you do it like this:

Container = Union[Sequence['Data'], int] # All this wouldn't work with List because of invariant arg type
Data = Union[Container, str, bytes, bool, float, int, dict]
pytypes.resolve_fw_decl(Container) # helpful exception test will show up if this line is left out
pytypes.is_subtype(typing.List[float], Container) # true
pytypes.is_subtype(typing.List[complex], Container) # false

Note that you need Sequence still because of the invariant parameter of List discussed in other thread.

mitar commented 6 years ago

You are amazing! I will check it immediately!

mitar commented 6 years ago

Hm, what is the second return value from resolve_fw_decl? Why is that necessary?

mitar commented 6 years ago

Oh, I see, I do not even have to check return values. Cool.

mitar commented 6 years ago

Works like a charm! Thanks.

mitar commented 6 years ago

Care to release a version with this improvement?

Stewori commented 6 years ago

The first return value is the type itself. In case the input was one pure forward declaration as a string this is relevant. The second return value is a boolean indicating if a change was made, i.e. it is true if at least one actual forward declaration was found and resolved. This is to help people assess if they should eventually repeat or update operations or hashes or whatever they might have already done with their type. It's also useful for testing.

There are actually some remaining todos on this issue (so reopening it for now...):

Stewori commented 6 years ago

Care to release a version with this improvement?

I want to have a new release soon, but would like to get some more stuff in, e.g. at least some of the todos listed above, fixes/resolutions for https://github.com/Stewori/pytypes/issues/23, https://github.com/Stewori/pytypes/issues/18, https://github.com/Stewori/pytypes/issues/20, eventually https://github.com/Stewori/pytypes/issues/21, failures mentioned in https://github.com/python-attrs/attrs/issues/301#issuecomment-347552114 Help is welcome!

mitar commented 6 years ago

Hm, it seems there is no protection against infinite recursion. I get stack overflow with this test:

import typing

from pytypes import type_util

Data = typing.Union['Container', float]
Container = typing.Union[Data, int]

type_util.resolve_fw_decl(Data)
type_util.resolve_fw_decl(Container)

type_util._issubclass(list, Container, bound_typevars={})

Of course it is obvious that definition of types is buggy, but maybe resolve_fw_decl could validate that?

Stewori commented 6 years ago

maybe resolve_fw_decl could validate that?

Is it possible to have a recursion issue with a type not involving Union? I'm spontaneously not entirely sure.

If so, I think it would be better to make _is_subclass_Union recursion proof in general. I'm just wondering what would be the best way to do it (e.g. caching already checked types; maybe allow it to run at first a special _issubclass mode that does not follow forward references to have the succeed-fast potential).

Otherwise we would have to make _issubclass itself recursion proof (doable as well, but more costly regarding caching).

It would be good to have an eventual example for recursion issue not involving Union. For better understanding and later for testing.

mitar commented 6 years ago

Is it possible to have a recursion issue with a type not involving Union? I'm spontaneously not entirely sure.

I think Union + forward declarations.

Or Union types without forward declarations if somebody manually "patch" Union objects in some other way.

If so, I think it would be better to make _is_subclass_Union recursion proof in general.

Are you sure? Keeping track might introduce additional performance cost. But from design perspective this is probably cleaner, yes.

I'm just wondering what would be the best way to do it

I would just have an extra argument to it, similar to bound variables, stored types being in process of resolving.

Stewori commented 6 years ago

I think Union + forward declarations.

I mean are there examples without Union, i.e. is this issue specific to Union checking or do we have to think more general?

Are you sure? Keeping track might introduce additional performance cost. But from design perspective this is probably cleaner, yes.

My philosophy in pytypes is to allow user to opt out of such arguable stuff. Aside that I value correctness higher than performance (in production, one would disable typechecking anyway I suppose). Also, that's why I think of a fast-succeed mode.

mitar commented 6 years ago

I mean are there examples without Union, i.e. is this issue specific to Union checking or do we have to think more general?

I cannot imagine a case without Union. This is the only current type which allows multiple options, no?

Stewori commented 6 years ago

Any solution to this would require at least one more new arg for _issubclass and thus the whole type checking function family. I wonder if it would be better to wrap all these args into kind of class like type_memo in typeguard.

I cannot imagine a case without Union

Me neither. I hope we don't overlook something.

Stewori commented 6 years ago

As of https://github.com/Stewori/pytypes/commit/74ef4b9babfe3b1d70b0d547e8e8eea575dcdd1b _issubclass performs recursion-checks if a _ForwardRef is encountered. As far as I tested, this solves the recursion issue you observe. Would be good if you could confirm this. Final todo for this issue is to add tests. Help welcome...

mitar commented 6 years ago

I have not encountered this in practice but only during testings of new code. So the example above is the only one I have.

Hm, but _issubclass does not see anymore _ForwardRefs if they were resolved? Or am I misunderstanding your comment? Or resolving does not get rid of _ForwardRefs?

New code seems to work through against my example above.

Stewori commented 6 years ago

No, it actually does not get rid of _ForwardRef. After looking more closely into _ForwardRef I had learned that it provides fields to store the referenced type, i.e. __forward_arg__, __forward_value__ and __forward_evaluated__. On resolving, __forward_value__ shall be filled with the actual type, __forward_arg__ provides the string representation thereof. Using these fields seems to be the official way and avoids sort of hassle with replacing a type within a potentially complex structure of PEP484 types. The downside is, it allows for cycles, but that's not my invention. _issubclass should now be cycle proof regarding this. And I think without much additional overhead.

mitar commented 6 years ago

Awesome!

mitar commented 6 years ago

As of 74ef4b9 _issubclass performs recursion-checks if a _ForwardRef is encountered. As far as I tested, this solves the recursion issue you observe.

So it currently returns False in such case, no? Wouldn't raising an exception be better? Because this is a faulty type to begin with.