astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
33.36k stars 1.11k forks source link

Comprehension / generator, opt-in feature to preserve line breaks #11753

Open Martin19037 opened 6 months ago

Martin19037 commented 6 months ago

Before I begin, here is some context:

I would love to incorporate ruff as a pre-commit formatter for one of our projects. We developed the habit of almost always separating comprehension target-expression, for-in statement(s), and optional if-expression(s) into separate lines, except for very short and trivial comprehensions. Naturally, this makes identifying these separate comprehension parts very easy at a glance, at the cost of a few more lines. Also, we have increased our line length somewhat.

In its current form, ruff will always collapse comprehensions to a single line if:

We do use trailing commas ourselves, so many of our hand-formatted comprehensions that contain trailing commas in any form remain more or less unchanged after running ruff format. The ones that do not, usually get collapsed to a single line mess (also due to our increased line length).

One toy-example for this would be:

[
    i + n
    for i in range(5)
    for n in range(5)
    if i < 3
]

which would be collapsed to a single line:

[i + n for i in range(5) for n in range(5) if i < 3]

which is not as easy to recognize at a glance as the original example.

Would it be possible to have an opt-in feature to always preserve line breaks in comprehensions and generators? The behavior would be the same as if there were a trailing comma anywhere in the comprehension, and each comprehension-part is given a separate line, maybe even expand the whole comprehension into separate lines if there is at least one line break in it.

MichaReiser commented 6 months ago

Thanks for the detailed write up. I can understand that nested comprehensions can get hard to read.

Would it be possible to have an opt-in feature to always preserve line breaks in comprehensions and generators?

Respecting line breaks would set a new precedence for ruff/black. There are other formatters that rely on line breaks. For example, prettier doesn't collapse an object expression if it has a line break after the last property's value:

let a = { 
    a: 1,
    b: 2
}

Whereas the following will be collapsed because it doesn't have a line break after b


let a = { 
    a: 1,
    b: 2 }

I could see a similar rule where the comprehension wouldn't collapse if there's a line break before the closing ], but I'm very hesitant of adding it because:

Black's/Ruff's magic trailing comma suffers the same problem.

I'm more open if we could identify a formatting rule that makes comprehensions more readable in general without requiring manual intervention. Are there specific patterns in the body that should never collapse? E.g. if there's any nesting?

Martin19037 commented 6 months ago

@MichaReiser Thank you for your feedback. Your arguments sparked a discussion within our team, what the actual requirements of such a feature would be, and that the initial idea of relying on line breaks is not sufficient or feasible.

Initially, we though that the decision, whether to collapse or expand a given comprehension is mostly subjective. After checking a number of actual examples from our code base, we noticed a pattern that may describe what we are looking for. We usually collapse comprehensions that do not stray too far from the "visual span", as in, you can see and perceive the whole comprehension without having to scan your eyes along horizontally too much, if that makes sense to you.

Another toy example that may show what I mean. Here are two comprehensions that would produce the same AST:

[x for x in range(3) if x == 0]

[descriptive_name for descriptive_name in range(3) if descriptive_name == 0]

We would prefer the top one collapsed, since it is short enough. The bottom one we would expand, as it is more effort to scan/read along the line to know what is going on, even though it has the same complexity as the top one.

Using this knowledge, our suggestion would be to introduce a maximum comprehension (or generator) expression length (opt-in of course, to preserve Black compatibility). Unlike the line length parameter, it only describes the length of a collapsed comprehension expression itself, regardless of indentation or nesting. I do not know any implementation details, but I imagine it working like so:

Regarding nesting, I imagine this also working for deeply nested and highly indented cases. If a nested comprehension expression is short enough, it may be collapsed within a larger, expanded one. The overall line-length still wins over this, if indentation or nesting is too high.

Does this make sense at all? Also is this technically feasible to implement, or are we dreaming too much here?

Avasam commented 5 months ago

This is something I've suggested on the formatter alpha/beta feedback as well, and probably my only reticence in using Black or Ruff as a formatter (still on autopep8 + add-trailing-comma ). And bad unwrapping is the main reason (amongst others) I don't use prettier. https://github.com/astral-sh/ruff/discussions/7310#discussioncomment-7027958

The tl;dr of my post above: there's magic commas for most of my needs where I want to force a vertical style, but nothing as such for comprehensions and conditional expressions. (unless you count using # fmt: skip on most of them). Although i could be happy enough with a single # ?


Similar comment from @Booplicate : https://github.com/astral-sh/ruff/discussions/7310#discussioncomment-7428754


I feel like @Martin19037 's description fits pretty well. Having a conditional/ternary/comprehension-specific length (total line length? count just the comprehension itself?) could be weird at first, but it fits all the requirements established by @MichaReiser (consistent, reversible, no manual intervention).

Although there's also the question of whether the in operator in comprehensions should be unwrapped as well.

aronhoff commented 2 weeks ago

+1, this would be a really good addition.

I would be happy with @Martin19037's suggestion for the comprehension specific line length.

If we need more of a sledgehammer solution for some reason, even a bool setting that expands all comprehensions into the multiline format would work. I find it very rare that one would be trivial enough not to expand. It is so much easier to understand what's happening in the expanded format, even if the lines end up short. I see these comprehensions as the exception rather than the rule.

I was considering adding ruff to one of my projects, but seeing that it would collapse all comprehensions turned me away.

Booplicate commented 2 weeks ago

Have separate line length would probably be the best. Alternatively, never unwrapping wrapped lines could also work.