astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.53k stars 1.08k forks source link

Formatter parenthesizes over-long starred expression in an assignment value which prodcues invalid syntax #11759

Closed jasikpark closed 5 months ago

jasikpark commented 5 months ago

I ran cargo fuzz run ruff_formatter_validity fuzz/artifacts/ruff_formatter_validity/minimized-from-31d7210d32c8aaabd0b146d1dbcb10b8c2184286 on commit 9b2cf569b22855439fa916be6fc417b372074f42.

Filename:

fuzz/artifacts/ruff_formatter_validity/minimized-from-31d7210d32c8aaabd0b146d1dbcb10b8c2184286

File contents:

B=*R*VVVVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVRBVVVVRBVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVVVVRB

The crash was

Running: fuzz/artifacts/ruff_formatter_validity/minimized-from-31d7210d32c8aaabd0b146d1dbcb10b8c2184286
thread '<unnamed>' panicked at fuzz_targets/ruff_formatter_validity.rs:65:9:
formatter introduced a parse error
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/fuzz/fuzz_targets/ruff_formatter_validity.rs#L65-L68

i.e. formatting the above code produces lint errors in code that previously did not produce lint errors.

/cc @MichaReiser

jasikpark commented 5 months ago

Initial code:

https://play.ruff.rs/a5f646d9-bce1-468c-bce5-c9d8322e7c98

Formatted once produces lint errors:

https://play.ruff.rs/6e15deb5-03c3-463c-91c6-57a871a17d5e

jasikpark commented 5 months ago

Original AST:

Module(
    ModModule {
        range: 0..85,
        body: [
            Assign(
                StmtAssign {
                    range: 0..85,
                    targets: [
                        Name(
                            ExprName {
                                range: 0..1,
                                id: "B",
                                ctx: Store,
                            },
                        ),
                    ],
                    value: Starred(
                        ExprStarred {
                            range: 2..85,
                            value: BinOp(
                                ExprBinOp {
                                    range: 3..85,
                                    left: Name(
                                        ExprName {
                                            range: 3..4,
                                            id: "R",
                                            ctx: Load,
                                        },
                                    ),
                                    op: Mult,
                                    right: Name(
                                        ExprName {
                                            range: 5..85,
                                            id: "VVVVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVRBVVVVRBVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVVVVRB",
                                            ctx: Load,
                                        },
                                    ),
                                },
                            ),
                            ctx: Load,
                        },
                    ),
                },
            ),
        ],
    },
)

AST after formatting:

Module(
    ModModule {
        range: 0..102,
        body: [
            Assign(
                StmtAssign {
                    range: 0..101,
                    targets: [
                        Name(
                            ExprName {
                                range: 0..1,
                                id: "B",
                                ctx: Store,
                            },
                        ),
                    ],
                    value: Starred(
                        ExprStarred {
                            range: 10..99,
                            value: BinOp(
                                ExprBinOp {
                                    range: 11..99,
                                    left: Name(
                                        ExprName {
                                            range: 11..12,
                                            id: "R",
                                            ctx: Load,
                                        },
                                    ),
                                    op: Mult,
                                    right: Name(
                                        ExprName {
                                            range: 19..99,
                                            id: "VVVVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVRBVVVVRBVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVVVVRB",
                                            ctx: Load,
                                        },
                                    ),
                                },
                            ),
                            ctx: Load,
                        },
                    ),
                },
            ),
        ],
    },
)
MichaReiser commented 5 months ago

A "simple" fix is to change

Subject: [PATCH] Track `File` status as input. Implement module resovler and source code as derived queries.
---
Index: crates/ruff_python_formatter/src/expression/expr_starred.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/ruff_python_formatter/src/expression/expr_starred.rs b/crates/ruff_python_formatter/src/expression/expr_starred.rs
--- a/crates/ruff_python_formatter/src/expression/expr_starred.rs   (revision 5a5a588a721731e09eb404466649109ff23739be)
+++ b/crates/ruff_python_formatter/src/expression/expr_starred.rs   (date 1717654257492)
@@ -31,6 +31,6 @@
         _parent: AnyNodeRef,
         _context: &PyFormatContext,
     ) -> OptionalParentheses {
-        OptionalParentheses::Multiline
+        OptionalParentheses::Never
     }
 }

But I need to validate this in more detail. I also need to go through other places where we parenthesize expressions to ensure starred expressions are valid in that context.

MichaReiser commented 5 months ago

@dhruvmanila pointed out that the original code isn't valid syntax but it isn't a parse time error

> B=*R*VVVVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVRBVVVVRBVVVVVVRBVVVVVVVVVVVVVVVVVVVVVVVVVVVRB
  File "<stdin>", line 1
SyntaxError: can't use starred expression here
jasikpark commented 5 months ago

So it's a case where the issue isn't detectable by the linter without an initial formatting then?

MichaReiser commented 5 months ago

Yeah, it's a syntax error that isn't detected by the parser because it's a compile time syntax error.

jasikpark commented 5 months ago

should it be detectable, or is this expected for the linter? the ruff_formatter_validity fuzz test may need to be adjusted or at least commented on if it's meant to be allowed

MichaReiser commented 5 months ago

It's something we may want to detect but isn't something that ruff supports today. It's also an open question if the check should be part of the parser or if it is a deferred linter check

jasikpark commented 5 months ago

So the fuzzer test is running the linter over code, then formatting it, then checking it a second time if the first lint didn't detect any problems. Is the formatter fundamentally changing the code from something incorrect that the linter can't detect to something incorrect the linter can detect? I don't fully understand python operator precedence, and I guess it doesn't matter since it's invalid code?

MichaReiser commented 5 months ago

s the formatter fundamentally changing the code from something incorrect that the linter can't detect to something incorrect the linter can detect?

Yes, the formatter makes a change from an error that the ruff linter should detect (but doesn't today) to something that the parser (comes before linting) detects (already today).

But the code is in both cases invalid, it's just that Ruff doesn't detect one of the two.

jasikpark commented 5 months ago

Ah, I see

MichaReiser commented 5 months ago

While I think we could do better to prevent formatting this snippet in the first place, there's no actionable fix to the formatter that we could implement.