astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
33.3k stars 1.11k forks source link

UP009 fix changes a file from UTF-8 to a different declared encoding #14704

Open dscorbett opened 6 hours ago

dscorbett commented 6 hours ago

The fix for utf8-encoding-declaration (UP009) in Ruff 0.8.1 changes the file’s declared encoding when the redundant UTF-8 encoding declaration is followed by a non-UTF-8 encoding declaration. In that case, the UTF-8 declaration is not completely redundant, because it blocks the following declaration from having an effect. The fix should insert up to 2 blank lines so the other declarations have no effect, or the fix should delete the other declarations, or the check should not report a violation.

Example of a syntax error:

$ printf '\xef\xbb\xbf# coding: utf-8\n# coding: ascii\nprint("success")\n' >up009_1.py
$ python up009_1.py
success
$ ruff check --isolated --select UP009 up009_1.py --fix
Found 1 error (1 fixed, 0 remaining).
$ python up009_1.py
SyntaxError: encoding problem: ascii with BOM

Example of changed behavior:

$ printf '# coding: utf-8\n# coding: latin-1\nprint("\xc3\xa5")\n' >up009_2.py
$ python up009_2.py
å
$ ruff check --isolated --select UP009 up009_2.py --fix
Found 1 error (1 fixed, 0 remaining).
$ python up009_2.py
Ã¥
AlexWaygood commented 6 hours ago

or the check should not report a violation.

I'd vote for this option. The rule's purpose is to flag unnecessary UTF-8 encoding declarations. Clearly in some cases it's erroneously flagging encodings which are, in fact, necessary :-)