beanbaginc / kgb

Python function spy support for unit tests
MIT License
48 stars 5 forks source link

PyCharm is confused by "*args (tuple)" #7

Open mitchhentges opened 3 years ago

mitchhentges commented 3 years ago

Some kgb methods like spy_on are documented like so:

def spy_on(*args, **kwargs):
    """Spy on a function.

    ...

    Args:
        *args (tuple):
            Positional arguments to pass to
            :py:class:`~kgb.spies.FunctionSpy`.

        **kwargs (dict):
            Keyword arguments to pass to
            :py:class:`~kgb.spies.FunctionSpy`.
    """
    ...

PyCharm believes that this means that a tuple should be provided as the first argument: kgb

To be honest, I'm not sure the best way to solve this. The documentation is arguably correct, so perhaps this is a PyCharm issue. However, it's worth documenting :)

Thanks for kgb, it's great!

chipx86 commented 3 years ago

Glad you like kgb! Always happy to hear from users :) Are you able to share what you're using it for? And would you be interested in a link in our README?

Yeah, the problem with the Python ecosystem is that there is no standard for documentation. None. We have conventions, like PEP-257 that defines some basic things like where the docstring summary should go and how indentation sould work. We have ReStructuredText as a de-facto standard for docstring markup. But we don't have a standard for how arguments, return types, etc. should be documented.

Google has a convention, and numpy has theirs. Parsers for these wound up in Sphinx's napolean extension, so some projects have adopted those.

We initially adopted Google's for all of our products (starting with Review Board and Djblets), but continually found areas that, for our usage, would benefit from tweaks. So we have the Beanbag conventions now, which we implement in our beanbag-docutils extensions for Sphinx.

I'm not sure what PyCharm itself is looking for. If everyone used the same docstring standards, PyCharm would easily be able to follow those, but it certainly won't know about ours, and might make assumptions about Google's or Numpy's (if it supports those).

In our case, we're describing what *args is, which is a tuple, technically, but we're not describing it using Python's type annotations. PyCharm seems to be particularly interested in that tuple, rather than the *args part. That * should be telling PyCharm, hey, these are variable arguments, a standard thing in Python, but it's more focused on the tuple. That seems like a bug/area of improvement for PyCharm, and not something we can do much about.

The other thing, which may be is confusing PyCharm, is that we're not using type annotations in the docs. We do plan to adopt this in the future, but haven't decided yet whether we're going to include them in the docstring. Type annotations are super useful but kinda confusing for people not very familiar with them, so we err on the side of more human-readable types for doc generation.

Interesting problem for sure. Given the lack of standards, it's too bad we don't have some sort of way of presenting a schema for an expected docstring format, which tools could use. That'd be nice not just due to the various conventions out there, but also to help projects with docstring linting and house conventions in general (something existing docstring linting tools don't really support in any way).

mitchhentges commented 3 years ago

Are you able to share what you're using it for? And would you be interested in a link in our README?

For sure, it's an internal Mozilla project to produce nice emails for Phabricator. Feel free to put up a link :)


Yeah, the documentation situation in Python is definitely not very well standardized! However, I made a discovery here that's challenging my initial assumption that this is a PyCharm issue: I think that it's correctly parsing the */**, but it's applying the supplied type to each associated parameter. For example, for spy_on, it's expecting:

with spy_on(("tuple", "of"), ("args", "here"), kwarg_1={"dict": "of"}, kwarg_2={"args": True}):
    pass

Here's some example code and a screenshot showing this: Screenshot from 2021-04-08 19-55-59

I set up sphinx and napoleon to see what it thought, and it generated: Screenshot from 2021-04-08 19-42-48

... so, I suppose that it isn't getting too opinionated with the actual types being provided.


I can't seem to find any proposals or documentation describing the behaviour that PyCharm is doing, but I'm not sure that I disagree with it: if a user wants to describe the types of *args or **kwargs, it's redundant to say tuple or dict because that's implied by the */**. So, perhaps the PyCharm technique is correct?

Based on my understanding here, my personal approach would be to remove all redundant tuple/dict typings, or replace them with the type of the individual parameters within, if possible (if all args are int, then: *args (int)). What do you think? I'd be willing to make a PR to implement the above suggestion, if you're leaning the same way.

chipx86 commented 3 years ago

I can see the arguments (um, no pun intended, I swear), but we're going to stick with our current approach.

A lot of this is going to boil down to consistency and lack of standards.

Sometimes *args or **kwargs may have a known list of types that will be used, but so often (at least in our codebases) they won't.

I believe that PyCharm should be assuming anything about the type of args or kwargs, and this is what we're documenting. It's seeing these types as "what should the caller pass as values in place of this," but we're documenting "what are args and kwargs themselves?" And those are not ever going to be ints or strs.

PyCharm's inherent assumption, I believe, is wrong for any variable arguments, because of this situation:

def do_thing(i, s, b):
    """Do a thing.

    Args:
        i (int):
            A number.

        s (str):
            A string.

        b (bool):
            A boolean.
    """
    ...

def do_thing_with_print(*args):
    """Print and call do_thing.

    Args:
        *args (tuple):
            The arguments to pass to :py:func:`do_thing`.
    """
    print("Calling do_thing() with:')

    for value in args:
        print('... %r' % value)

    do_thing(*args)

Silly example, but you can immediately see that do_thing_with_print() has an *args that has 3 different types in it. In this case, we may know what those types would be, so we could document them if we wanted, but imagine if we were calling a user-provided function. We wouldn't be able to. That's not uncommon in Python.

What PyCharm is doing, though, is saying "I'm going to assume what these will be from a docstring and then yell at the user about it." If it wants to assume, fine, but that's a strong assumption to make. Now, if we had type annotations describing what was in there, that's a different story, but given the lack of a standard for docs, I strongly believe PyCharm should not be doing that.

I could see the argument for removing the type on *args, and if you're familiar with Python, that won't impact you at all. However, in our ecosystem, we get a lot of brand new contributors to some of our codebases, and extension writers for Review Board, who are new to Python, coming from Java or some other language. We make heavy use of *args and **kwargs, and we've heard from many people new to the Python world what a neat language feature that is. The fact that we document what *args and **kwargs inherently are (which, truthfully are tuple and dict types) has helped them work with them and learn how the * and ** actually work (that it's not just reserved names). Being able to modify **kwargs as a dictionary or build a custom tuple for a *args makes sense once you know their types.

It's also just very important for us to stay consistent everywhere. We want all parameters to be documented. If we leave them off in some places, people will leave them off in others in contributions. This isn't hypothetical — it's a lesson learned from when we were originally standardizing our documentation, based on Google's style (which makes all argument types optional).

Consistency extends to our entire suite of codebases, which has many many thousands of docstrings with *args and/or **kwargs. It's better for us to be consistent than to make an exception in kgb.

PyCharm may make some assumptions about docstrings, but we've seen other IDEs in the past make other assumptions about them (and about those of other conventions out there). I don't remember the details exactly, but I remember walking away thinking this is just one big game of whack-a-mole. I'm not a believer that one IDE should dictate a project's standards, when there are no standards for the language, so I still place this on PyCharm (or any other IDE that makes assumptions about things not backed by standards, without the ability to customize/turn those assumptions off).

I hope you understand :)

Regarding contributions, we don't use pull requests. Our day job is Review Board, so we use it for all code going into all of our projects. We use GitHub purely for the Git hosting (normally we even have the issue tracker disabled, since we have our own, but we left it up for kgb). Contributions can always go to https://reviews.reviewboard.org/

mitchhentges commented 3 years ago

No worries, I understand your perspective, and I appreciate the detailed thoughts.

I was going to mention the "remove the documented type on *args if it's a mixed type", but you've beaten me too it with a counterpoint (it's hard for contributors to know when doc types are required).

Sounds like you've considered this quite a bit, and I'm not sure that there's a solution that satisfies all the constraints, so I can appreciate how the current technique is one of the "most ideal" options. Thanks!

chipx86 commented 3 years ago

I think you've inspired me to write a blog post on all the Python documentation conventions, woes, and tradeoffs out there :) It's a complex subject. Maybe I'll take a stab at it this weekend.

I realized there's one more area where we're going to throw off PyCharm. Our string types.

Python has 2 string types (unicode and byte string), but, if you're in the situation we're in where you have to deal with Python 2 and 3 still, it has 3 string identifiers across 2-3: bytes, unicode, and str (version-native). A lot of things in Python 2 and 3 expect version-native strings (str maps to bytes on Python 2 and the former unicode on Python 3), while anything dealing with text should want Unicode strings, and anything dealing with raw data should want bytes.

To distinguish between those in our docs, we'll explicitly use unicode if it's a Unicode string, and str only if it's a native string.

We do plan to change that when we can 100% move away from Python 2, but enterprises move slowly, and we have to support it for a few more years.