astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
30.93k stars 1.02k forks source link

Update string rules when inside a forward reference #10586

Open dhruvmanila opened 5 months ago

dhruvmanila commented 5 months ago

This is happening because the checker is analyzing the string inside the annotation (three strings inside Literal). The ranges aren't correct in that case. We need to find a holistic solution to this because it's happening to all rules working on strings.

For example, here UP025 is triggered for the u'\\n' which is inside the string type annotation.

/Users/dhruv/playground/ruff/src/UP025.py:1:12: UP025 [*] Remove unicode literals from strings
  |
1 | NEWLINE_SEQUENCE: "te.Literal[u'\\n', '\\r\\n', '\\r']" = "\n"
  |            ^^^^^ UP025
  |
  = help: Remove unicode prefix

ℹ Safe fix
1   |-NEWLINE_SEQUENCE: "te.Literal[u'\\n', '\\r\\n', '\\r']" = "\n"
  1 |+NEWLINE_SEQENCE: "te.Literal[u'\\n', '\\r\\n', '\\r']" = "\n"

Found 1 error.
[*] 1 fixable with the --fix option.

We could potentially fix the range for the rules to be useful in the context of type annotation or ignore them completely. Nevertheless, we should completely avoid rules which might update the quotes because otherwise we need the context of the quote which contains the actual type annotation.

Originally posted by @dhruvmanila in https://github.com/astral-sh/ruff/issues/10546#issuecomment-2016886953

dhruvmanila commented 5 months ago

Hey @RaulWCosta, sorry for the late reply but yeah it is available.

I think the first step would be to fix the range. I was able to find out the root cause of the problem and it's here:

https://github.com/astral-sh/ruff/blob/de46a36bbcdc1df4a56fc486ae5c7bcf2c4bbba0/crates/ruff_python_parser/src/typing.rs#L39-L42

The example mentioned in the PR description contains escaped characters which makes this a complex annotation. This means that parse_type_annotation takes the else path which parses it as an expression and then relocates the parsed expression and any of it's child expressions to the same range. This is where the bug is. The relocation logic isn't correct as it doesn't relocate any of the sub-expressions relative to the root expression. cc @charliermarsh I think there was some discussion around this logic.

For example,

import typing as t

NEWLINE_SEQUENCE: "t.Literal[u'\\n', '\\r\\n', '\\r']" = "\n"

Along with the following patch to print the AST:

diff --git a/crates/ruff_python_parser/src/typing.rs b/crates/ruff_python_parser/src/typing.rs
index 18efe4041..09507977c 100644
--- a/crates/ruff_python_parser/src/typing.rs
+++ b/crates/ruff_python_parser/src/typing.rs
@@ -38,7 +38,9 @@ pub fn parse_type_annotation(
     } else {
         // Otherwise, consider this a "complex" annotation.
         let mut expr = parse_expression(value)?;
+        println!("{:#?}", expr);
         relocate_expr(&mut expr, range);
+        println!("{:#?}", expr);
         Ok((expr, AnnotationKind::Complex))
     }
 }

We can see that the relocated ranges aren't correct as they're set to the same range regardless of the position of the node:

diff --git a/a.txt b/b.txt
index bb13b780d..26b45c862 100644
--- a/a.txt
+++ b/b.txt
@@ -1,12 +1,13 @@
    ⋮  1 │+
  1 ⋮  2 │ Subscript(
  2 ⋮  3 │     ExprSubscript {
  3 ⋮    │-        range: 0..30,
    ⋮  4 │+        range: 38..74,
  4 ⋮  5 │         value: Attribute(
  5 ⋮  6 │             ExprAttribute {
  6 ⋮    │-                range: 0..9,
    ⋮  7 │+                range: 38..74,
  7 ⋮  8 │                 value: Name(
  8 ⋮  9 │                     ExprName {
  9 ⋮    │-                        range: 0..1,
    ⋮ 10 │+                        range: 38..74,
 10 ⋮ 11 │                         id: "t",
 11 ⋮ 12 │                         ctx: Load,
 12 ⋮ 13 │                     },
@@ -20,11 +21,11 @@ Subscript(
 20 ⋮ 21 │         ),
 21 ⋮ 22 │         slice: Tuple(
 22 ⋮ 23 │             ExprTuple {
 23 ⋮    │-                range: 10..29,
    ⋮ 24 │+                range: 38..74,
 24 ⋮ 25 │                 elts: [
 25 ⋮ 26 │                     StringLiteral(
 26 ⋮ 27 │                         ExprStringLiteral {
 27 ⋮    │-                            range: 10..15,
    ⋮ 28 │+                            range: 38..74,
 28 ⋮ 29 │                             value: StringLiteralValue {
 29 ⋮ 30 │                                 inner: Single(
 30 ⋮ 31 │                                     StringLiteral {
@@ -42,7 +43,7 @@ Subscript(
 42 ⋮ 43 │                     ),
 43 ⋮ 44 │                     StringLiteral(
 44 ⋮ 45 │                         ExprStringLiteral {
 45 ⋮    │-                            range: 17..23,
    ⋮ 46 │+                            range: 38..74,
 46 ⋮ 47 │                             value: StringLiteralValue {
 47 ⋮ 48 │                                 inner: Single(
 48 ⋮ 49 │                                     StringLiteral {
@@ -60,7 +61,7 @@ Subscript(
 60 ⋮ 61 │                     ),
 61 ⋮ 62 │                     StringLiteral(
 62 ⋮ 63 │                         ExprStringLiteral {
 63 ⋮    │-                            range: 25..29,
    ⋮ 64 │+                            range: 38..74,
 64 ⋮ 65 │                             value: StringLiteralValue {
 65 ⋮ 66 │                                 inner: Single(
 66 ⋮ 67 │                                     StringLiteral {

The next step would be to go through the rules which work on strings (search for StringLiteral, BytesLiteral, FString, StringLike) and decide on the following:

  1. Is it ok to highlight this violation? If not, skip this rule through checker.semantic().in_type_definition() check. For example, https://github.com/astral-sh/ruff/blob/de46a36bbcdc1df4a56fc486ae5c7bcf2c4bbba0/crates/ruff_linter/src/rules/flake8_type_checking/rules/runtime_string_union.rs#L55-L58
  2. Is it safe to fix this violation? If not, avoid creating the fix for it. For example, https://github.com/astral-sh/ruff/blob/de46a36bbcdc1df4a56fc486ae5c7bcf2c4bbba0/crates/ruff_linter/src/rules/pyupgrade/rules/use_pep604_annotation.rs#L68-L79

That said, I don't think any of the violations that diagnosis string quotes should be highlighted or fixed as it requires context of the surrounding quote of the annotation itself.

dhruvmanila commented 5 months ago

Thanks for the update! I'll assign the issue to you but do not feel any pressure to complete it :)

RaulWCosta commented 4 months ago

Hey @dhruvmanila, thank you for the clear explanation, but I believe I won't have time to do the fix

dhruvmanila commented 4 months ago

@RaulWCosta Hey, no worries, thanks for informing :)

dhruvmanila commented 1 month ago

We'd appreciate any contributions here. Refer to the issue description to understand what's happening and this comment on the root cause analysis and pointers on how to resolve it. Feel free to ping me for any questions.