asyncdef / aitertools

Async versions of the Python itertools features.
Apache License 2.0
20 stars 4 forks source link

Implementing map, filter, zip, enumerate and bugfix #4

Closed purpleP closed 6 years ago

purpleP commented 8 years ago

Okay, so first of all, I've started to rewrite tests. The general idea is that there's not need to waste time making tests with 'call function with some arguments and check if result is that and that', because for all functions there is one to one correspondence with itertools of builtin functions and they should give same results given same arguments.

The next step I want to take is create a decorator that would take a dictionary of this kind function_to_test -> set of tuples (test_id, args_kwargs_dict, expected) and create tests using parametrize mark for all args_kwargs by applying function_to_test to all args_kwargs_dict and checking if result is equal to expected if expected is not callable or by checking if result is equal to result of callable applied to the same arguments. I would create pytest plugin for that, and that would reduce the amount of boilerplate in tests even further.

The second thing I wanted to talk about it this weird (in my opinion) code formatting and other stuff.

if condition:

    someting

The blank line doesn't improve readability in any way really it's perfectly readable without it. I get the idea of wanting to visually mark blocks of code, but

  1. They already marked by indentation level.
  2. It makes less code visible on the screen and making code less readable because of that. One can easily imagine code with multiple ifs and tries that would even be ugly with this blank lines.
return f((await something))

Superflous parens also doesn't improve readability.

    async def __aiter__(self):
        """Return self."""
        return self

This comment doesn't make this code any clearer, while cluttering the code.

kevinconway commented 8 years ago

There's quite a breadth of changes introduced in this PR. I'll try to address them each individually.

Adding map, filter, and nwise_iter

I'm all for it. I think these would be great additions to the project and are called out specifically in the README. I appreciate you putting in the effort to add these features.

Changing the aiter behaviour

In the PR you remove a check for __anext__ but do not provide rationale or tests for this change. I'm capable of reading the Python documentation and seeing that the built-in iter function specifically checks for the iterable protocol rather than the iterator protocol, but I would very much like to see the reasoning behind the change laid out in a commit message.

Use of iscoroutinefunction

I am strongly against the inclusion of any asyncio related tools or imports in this project. It is not an asyncio library and I have no intention of locking in asyncio as the vendor for async behaviour. This library is intended as a generic PY35+ async iterator tool kit. In the PR, the use of isinstance(thing, types.CoroutineType) is replaced with asyncio.iscoroutinefunction. However, this function adds nothing beyond the previous isinstance check other than allowing for PY34 generator coroutines that are bound specifically to asyncio.

Not too long ago, I had a chat with another developer working on a similar project and we discussed asyncio lock-in. You can read up on the history here: https://github.com/asyncdef/aitertools/issues/1. Ultimately, I'm not interested in using any asyncio related tools in this project.

The InitializedAsyncIterable base class

Looking at how you've leveraged a base class to de-dupe some code related to iterator setup, I think this tool will be valuable for the project. We could transition existing implementations to use it without impacting the publicly available API. One change I'll ask for in the current usage of this base class is that you change the nested def used to initialize the state to an instance method that does the same thing. Python is not optimized for closures like this and it can, on rare and extreme cases, lead to excessive memory consumption.

Style comments

As far as extra newlines between block headers and content, if you choose not to have additional vertical whitespace in your commit it is still completely acceptable. I hope the project documentation is clear that the only style gates are the PEP8 (now pycodestyle) and PyFlakes linters. There is an additional PyLint check, but only PEP8 and PyFlakes are considered blocking. So long as your code passes those gates then there isn't going to be a debate related to style.

Directly addressing your comment, "readability" is not a data driven metric and is somewhat of a weasel word in my opinion. I won't decline your PR because you did or did not put an extra new line between if and the body. I also will not accept a PR that needlessly changes the newline usage of the project. The reason I put the style gates in place is to squash that conversation completely. If it fits, it ships.

Docstrings

I agree with your comment that "Returns self." is not a useful docstring and serves no purpose beyond passing the PEP257 linter. However, I'm not content to have methods that are undocumented. I would prefer we find a way to replace my lazy docs with something more meaningful. Either way, the PEP257 gate will remain in place. All functions and classes must have docstrings.

Test changes

I believe I understand your thinking here. Stop duplicating code and tests when we can compare the results of two implementations. I think there is a place for the kinds of tests you propose but they are not replacements for the existing coverage. Conceptually, and practically, these tests serve different purposes. The comparison tests serve to validate that the aitertools implementations will return the same results as the itertools implementations in a successful case. However, having additional coverage that directly targets specific behaviours of the async implementation are still valid. Rather than replacing tests I ask you to add more of them.

As far as test generation, I'm not a fan of the model this PR introduces. Years ago I worked on a JavaScript project that implemented the community standard for promises. The community spec (A+) provided a test suite that enforced the documented standards. However, the test suite was dynamically generated in a way similar to what this PR demonstrates and the result was quite a bit of confusion and frustration from the community. Test failures are difficult to track down when the tests aren't laid out explicitly. I'm all for parameterized tests, but I'm not a fan of dynamic test generation.

Also, the locals() modification at the end to avoid decorating functions pushes beyond my tolerance for magic. I don't see great value in munging the local namespace in order to avoid a dozen decorator applications.

purpleP commented 8 years ago

@kevinconway

  1. I've changed aiter behaviour because there was already issue about that and It was already discussed, but I haven't found pull request for that.

However, this function adds nothing beyond the previous isinstance check other than allowing for PY34 generator coroutines that are bound specifically to asyncio.

Um, you actually wrong. As the commit message to this change should say - I've fixed a bug.

In [1]: import types

In [2]: async def f():
   ...:     pass
   ...: 

In [3]: isinstance(f, types.CoroutineType)
Out[3]: False

And yes, it allows py34 coroutines, which I just realised is useless, because this library is py35> only. I know only one alternative to this check -> inspect.iscoroutinefunction. We could use this.

  1. As for styles, comments and stuff I agree with you. I mean if you want this comments for pep checkers and stuff - so be it. Although I want to say that pep isn't always makes code more readable and better. There is a good talk about that in youtube from pycon by Brandon Rhodes, check it out if you haven't seen it.
  2. The locals trick... I didn't like it as much as you =) Right now I'm asking pytest developers how to do that properly. You see, you can do this thing automatically before pytest start a so called 'collection' via specific hook.
  3. As for tests...

The comparison tests serve to validate that the aitertools implementations will return the same results as the itertools implementations in a successful case. However, having additional coverage that directly targets specific behaviours of the async implementation are still valid.

Well yes, and all tests that checked specific behaviours are still there. Like TypeError tests. So I don't see problem there. I've also had concerns about confusion for people wanting to create new tests, but I think if the the test generation would be properly documented and not stretched out to cases where it would be hard to understand we will be okay. As for debugging, I've debug both kinds of async tests while doing this PR, haven't seen any additional problems with this approach.

@pytest.mark.simple_func_test
# or
@pytest.mark.input_output_test(
    function, input, output
)
# function can be any callable
# input is a dict 
# {'args': tuple of positional arguments for tested function,
# 'kwargs': dict of keyword arguments for a function}
# output can be just expected output or callable in which case it would be called
# with the same arguments as function and they output would be compared.

I think with this explanation in the future plugin or in your tests code would be sufficient, because it's a very straightforward idea.

kevinconway commented 8 years ago

Ah yes, OK I see what you mean with the aiter bug and the types.CoroutineType bug. Each of those should be a separate pull request. I can merge the aiter patch and close the issue (https://github.com/asyncdef/aitertools/issues/2) you mentioned. I would also suggest using the inspect.iscoroutinefunction to fix the bug related to types.CoroutineType.

purpleP commented 8 years ago

@kevinconway How to do pull request with only that change? Should I cherrypick commits or something? Sorry, never did that before.

kevinconway commented 8 years ago

I don't think cherry-pick will work in this case. The easiest thing, I think, is to create branches and copy/paste the changes.