astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
31.16k stars 1.04k forks source link

[Rule request] Flag unsupported characters in logging method calls #12178

Open huonw opened 2 months ago

huonw commented 2 months ago

There's two rules (F509, PLE1300) that detect when a % formatted string include an invalid format character. However, these don't seem to flag format strings passed to logging functions (debug, info, etc.). These strings use % for formatting in the same way.

(References: "The message attribute of the record is computed using msg % args" https://docs.python.org/3/library/logging.html#logging.Formatter.format , https://github.com/python/cpython/blob/9728ead36181fb3f0a4b2e8a7291a3e0a702b952/Lib/logging/__init__.py#L391-L401).

Reproducer: https://play.ruff.rs/36d5756e-44c7-48a1-8d8e-ae45a29ee480

"""x."""
import logging

x = "flagged: %A" % 1 # error: F509, PLE1300
logging.error("ignored: %A", 1) # no error

Settings (note "select": ["ALL"])

{
  "preview": false,
  "builtins": [],
  "target-version": "py312",
  "line-length": 88,
  "indent-width": 4,
  "lint": {
    "allowed-confusables": [],
    "dummy-variable-rgx": "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$",
    "extend-select": [],
    "extend-fixable": [],
    "external": [],
    "ignore": [],
    "select": [
      "ALL"
    ]
  },
  "format": {
    "indent-style": "space",
    "quote-style": "double"
  }
}

To confirm that this is an error, comment out the x = ... line and run the code. It prints:

--- Logging error ---
Traceback (most recent call last):
  File "/Users/huon/.pyenv/versions/3.10.4/lib/python3.10/logging/__init__.py", line 1100, in emit
    msg = self.format(record)
  File "/Users/huon/.pyenv/versions/3.10.4/lib/python3.10/logging/__init__.py", line 943, in format
    return fmt.format(record)
  File "/Users/huon/.pyenv/versions/3.10.4/lib/python3.10/logging/__init__.py", line 678, in format
    record.message = record.getMessage()
  File "/Users/huon/.pyenv/versions/3.10.4/lib/python3.10/logging/__init__.py", line 368, in getMessage
    msg = msg % self.args
ValueError: unsupported format character 'A' (0x41) at index 10
Call stack:
  File "/private/tmp/test.py", line 6, in <module>
    logging.error("ignored: %A", 1)  # no error
Message: 'ignored: %A'
Arguments: (1,)

Issues that are (somewhat) related:

Thanks for ruff, very speedy!

dhruvmanila commented 2 months ago

Thank you for the detailed issue!

I think it's a little bit complicated than that. The formatting is decided by the Formatter objects which can use % but it can also use other formatting style which is determined by the style parameter and it can be set separately for each logger or can be set through the global config.

Or, maybe we can just check if the string uses %-style format characters and make an assumption that it uses the % style.

huonw commented 2 months ago

Thanks for the quick reply!

I'm a little confused by the ins-and-outs of logging at times... but, I think there's two layers of formatting:

  1. Rendering of the individual LogRecord's message, using the values passed to info/error/etc (msg & args) to set message.
  2. Constructing the final output from the message and other attributes (time etc), using a Formatter like you say. (And its style arg to customise.)

For 1, I believe this happens unconditionally using %, unless there's some way to use a LogRecord subclass/replacement. See code: https://github.com/python/cpython/blob/9728ead36181fb3f0a4b2e8a7291a3e0a702b952/Lib/logging/__init__.py#L391-L401

In addition, there's existing linting rules that explicitly suggest turning f-strings and .format calls into %, so "use % for logging" is an assumption with precedent:

What do you think?