RET504 false-positive when variable built up with assignments and returned

gdub commented 1 year ago

Example:

def get_options(option1, option2):
    options = []
    if option1:
        options = options + ["option1"]
    if option2:
        options = options + ["option2"]
    return options

Output:

$ ruff check --isolated --select RET ret_test.py 
ret_test2.py:16:12: RET504 Unnecessary variable assignment before `return` statement
Found 1 error.

In this case, there is no unnecessary variable assignment that could be removed and cannot do an earlier return as need to get through all the conditionals.

That said, the error does not trigger if one were to instead use an in-place operator such as +=:

def get_options(option1, option2):
    options = []
    if option1:
        options += ["option1"]
    if option2:
        options += ["option2"]
    return options

Version:

$ ruff --version
ruff 0.0.247

charliermarsh commented 1 year ago

Possibly not respecting overrides, does look like a bug.

henryiii commented 1 year ago

Just noticed this too!

def f(y):
    x = 1
    if y:
        x = 2

    return x

$ ruff tmp.py --isolated --select RET
tmp.py:6:12: RET504 Unnecessary variable assignment before `return` statement
Found 1 error.

It seems the RET checks don't handle branches very well (the "else after" ones also are broken when you have a chain).

charliermarsh commented 1 year ago

It may wants you to do this, even though it's awkward:

def f(y):
    if y:
        return 2

    return 1

the "else after" ones also are broken when you have a chain

Is this on main? I removed enforcement for elif chains in the most recent release.

charliermarsh commented 1 year ago

(I find these rules really tedious lol. Maybe they should just be removed.)

henryiii commented 1 year ago

They will come back again when adding pylint rules. :) I like these in pylint a lot, which I is why I'm happy to see them showing up in a faster pre-commit friendly form, but the pylint version is solid and doesn't trigger falsely, while RET seems to not properly handle branching (I expect it's a problem with the flake8 plugin).

It may wants you to do this, even though it's awkward

It's not practical in the actual code that caused me to look into this. The fix should be to just only enforce x = value; return value and not if the assignment is in an if statement (with statement is 90% of the time fine, just not if statements)

henryiii commented 1 year ago

I haven't rerun with the most recent release yet for the RET 507 and similar rules, quite possibly fixed!

charliermarsh commented 1 year ago

Ah yeah, I mostly meant that I have issues with the implementation. It's a direct port of the flake8 plugin, and it just has a bunch of limitations (in addition to being hard to reason about).

henryiii commented 1 year ago

Ahh, I see what you mean.

def f(y):
    x = 1
    if y:
        return 2

    return x

Would pass. I could live with that in some/most cases.

charliermarsh commented 1 year ago

I think it will still complain about x=1 :(

henryiii commented 1 year ago

In a realistic example (where other things are happening to the x before the final if), it seems pretty happy. I didn't end up with any false positives that I couldn't handle knowing that it was fine just changing the assignment in the if to a return. There were several when converting Plumbum and https://github.com/scikit-hep/particle/pull/472/commits/10caa90bb139f6b754bdad1ba577c832e4e3b465 was the one that prompted me to search for this issue.

martinlehoux commented 1 year ago

Just to add one more (almost) real life example: I have this pattern that seems reasonable in Django to build a queryset based on different inputs

def get_queryset(option_1, option_2):
    queryset: Any = None

    queryset = queryset.filter(a=1)

    if option_1:
        queryset = queryset.annotate(b=Value(2))

    if option_2:
        queryset = queryset.filter(c=3)

    return queryset

main.py:15:12: RET504 Unnecessary variable assignment before return statement

henryiii commented 1 year ago

This would be how you’d make it happy:


def get_queryset(option_1, option_2):
    queryset: Any = None

    queryset = queryset.filter(a=1)

    if option_1:
        queryset = queryset.filter(b=2)

    if option_2:
        return queryset.filter(c=3)

    return queryset

Not quite as symmetric, but not really “worse” from a readers standpoint either. I’d probably disable the check if this was a lot of very symmetric conditions (like you have described), though in practice most of them are not that symmetric and are fine written with the extra return. Overriding a variable is a bit ugly, so avoiding an unnecessary one isn’t that bad.

gdub commented 1 year ago

Personally, I dislike the resulting work-around because:

The lack of symmetry feels unnatural.
If you need to add an additional condition block, you would have churn turning the return statement back to an assignment.
It does not save any lines, as is the case for simple assignment then return.

One idea would be to only flag the single-assignment and return scenarios, e.g.:

def get_queryset():
    queryset = Model.filter(a=1)
    return queryset

...and not flag scenarios where there are multiple assignments to the same variable before returning, e.g.:

def get_queryset(option_1, option_2):
    queryset = Model.filter(a=1)
    if option_1:
        queryset = queryset.filter(b=2)
    if option_2:
        queryset = queryset.filter(c=3)
    return queryset

Though, I guess that would also mean that something like this would pass not get flagged:

def get_queryset():
    queryset = Model.filter(a=1)
    queryset = queryset.filter(c=3)
    return queryset

Which raises the question of what is it that this rule is really meant to be targeting? Cases where would save a line of code due to an unnecessary assignment? Should check be simplified to only look for the simple assign/return case that happens at the same nesting level, and ignore if there is more complex branching/looping involved?

Czaki commented 1 year ago

Though, I guess that would also mean that something like this would pass not get flagged:

If the rule should only be ignored if the assignment is under if clause.

bellini666 commented 1 year ago

I agree with @gdub . In cases similar to the one mentioned by him, I usually do # noqa: RET504, but it would be very good if that kind of usage got exempted from the rule.

Don't know if it is simple for ruff to detect it though =P

charliermarsh commented 1 year ago

The multiple assignments heuristic seems like it could work... pretty well? In my personal opinion I actually prefer this to inlining return queryset.filter(c=3):

def get_queryset():
    queryset = Model.filter(a=1)
    queryset = queryset.filter(c=3)
    return queryset

It also has the nice property that if you add another operation to the fluent chain, the diff much clearer. But, it's all a bit subjective of course.

I think the intent of the rule is to capture "trivial" errors like:

x = 1
return x

Or:

x = 1
print("Got 1")
return x

Part of me is like... is this even a rule worth having? I wish I had data on how many people are using it and how often it's # noqa'd or ignored, etc.

NeilGirdhar commented 1 year ago

I wish I had data on how many people are using it and how often it's # noqa'd or ignored, etc.

Can you run Ruff on a corpus of projects and gather rule hit counts and a random sampling of examples? MyPy does something like that with its MyPy Primer, which they use to evaluate pull requests. It might be nice to see how various changes make rules more or less sensitive, and examples of lines that would be newly flagged, and also examples of lines would no longer be flagged.

charliermarsh commented 1 year ago

I'd like to do something like this: index projects using Ruff, and get some data on their configurations (and on how often they noqa various rules).

gdub commented 1 year ago

Interesting looking through some results from: https://github.com/search?q=noqa%3A+RET504&type=code https://github.com/search?q=RET504&type=Code

Shows up in some ignores with comments, e.g.:

# RET504: Unnecessary variable before return (ugly)

    # suppress what appears to be false-positives with these checks
    'RET504',
    'RET505',
    'RET506',
    'RET507',
    'RET508',

See many examples with similar chaining-style assignment with conditionals that we've touched on already.

Here's an interesting case where variable is assigned to, referenced, and then returned: https://github.com/KyleKing/pytest_cache_assert/blob/47d0b538528a9c9cdbc182d2b276462a2aa0fd4d/tests/test_flask.py#L20

def _create_app() -> Flask:
    app = Flask(__name__)

    @app.route('/hello')
    def hello() -> str:
        """Hello endpoint."""
        return 'Hello, World!'

    return app  # noqa: RET504

Seems like something that shouldn't be flagged.

Perhaps instead of an assignment count, heuristic could be reference count. In other words, excluding return statement, flag if the only reference was assignment.

charliermarsh commented 1 year ago

This PR productionizes that suggestion (avoid RET504 for variables with multiple assignments): https://github.com/charliermarsh/ruff/pull/3393.

Feedback welcome, I won't be merging tonight.

charliermarsh commented 1 year ago

(Fixed the decorator thing in #3395, separate bug!)

charliermarsh commented 1 year ago

Ok, I looked through the GitHub Code Search, and I do see a lot of examples of this kind of "fluent interface"-like assignment style, e.g.:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
command = self.validate_command(command)
command = self.do_magic(command=command)
return command  # noqa: RET504

This strikes me as very reasonable code, so I'm gonna go ahead and merge that PR.

henryiii commented 1 year ago

Personally, like moving the return up one, reducing the number of variable overwrites there (and that sort of code isn't very good for static typing unless every command returns the same thing!)

I don't see any problem with:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
command = self.validate_command(command)
return self.do_magic(command=command)

In fact, there's only one overwrite now, so I'd actually do:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
validated_command = self.validate_command(command)
return self.do_magic(command=validated_command)

IMO, this now reads much better. The point of style checks is to force better style, not to try not to bother anyone or conform to what everyone is doing if it's wrong. I think the biggest problem with this check was that it wasn't clear how to "fix" it from the message (and also some false positives, like the function false positive above). And it's totally fine to disable a style check you don't agree with! Whenever I talk about style checking, I emphasize this (especially WRT PyLint).

Also, one of the points of this sort of check is to help two people writing the same logic get the same code. Returning without assigning is perfectly valid, so it's reasonable to force this rather than force an assignment always before return.

For example, I think this would have read much better without an extra assignment: https://github.com/scikit-build/scikit-build-core/pull/197/files#diff-5a9598893dbb4007601522cfb26b27c92d790c18bf35d7bfe86205ae1955fa0bR47-R48

I'm not sure why Ruff didn't find that, as I'd guess #3393 wasn't in the version of Ruff that ran on that, but I wish it did.

astral-sh / ruff

RET504 false-positive when variable built up with assignments and returned #2950