Consider non-whitespace split points for line-too-long exemptions

alanwilter commented 1 year ago

I have this line of code:

        if not opt.save_model_path:
            opt.save_model_path = os.path.abspath(
                f"models_out/FineTuned_{os.path.basename(opt.model_path).split('.')[0]}_{opt.feature_name}_{opt.loss}_{today_date_string}.pth"
            )
            print(f"Model name is automatically set to {opt.save_model_path}")

The offending line has 143 chars and my pyproject.toml is:

[tool.ruff]
target-version = "py39"
line-length = 120
extend-select = [
    "B",   # flake8-bugbear
    "C4",  # flake8-comprehensions
    "ERA", # flake8-eradicate/eradicate
    "I",   # isort
    "N",   # pep8-naming
    # "NPY", # numpy
    "PIE",  # flake8-pie
    "PGH",  # pygrep
    "RUF",  # ruff checks
    "SIM",  # flake8-simplify
    "TCH",  # flake8-type-checking
    "TID",  # flake8-tidy-imports
    "UP",   # pyupgrade
    "E501", # check line length
]

But ruff check . --select E501 --line-length=120 does not pick it! I understand that this line:

f"models_out/FineTuned_{os.path.basename(opt.model_path).split('.')[0]}_{opt.feature_name}_{opt.loss}_{today_date_string}.pth"

cannot be automatically formatted but the others linters will spot them. Also, if I comment the line like: # hi f"models_out/FineTuned_{os.pa... and run ruff check . --select E501 --line-length=120 I get this:

isam/main.py:71:121: E501 Line too long (144 > 120)

So it seems like a bug for me. Because I truly want anything going beyond 120 chars limit so I can fix manually if needed.

T-256 commented 1 year ago

#LINE_WITH_89_CHARACTERSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS

Also tried this with ruff check test.py --select E501, no violation reported! After adding a whitespace after #, it starts reporting.

charliermarsh commented 1 year ago

We don't enforce E501 on lines that consist of a single token -- this is mentioned in the rule documentation here. The comment is actually a decent example of why: sometimes, there just isn't a way to split a line, and so enforcing the lint rule isn't' really helpful.

I suspect we could do a better job with the f-string case, since there actually are some reasonable split points in that code.

alanwilter commented 1 year ago

But I'd still like to be warned about long lines.

flake8 test.py
test.py:2:80: E501 line too long (126 > 79 characters)

One of the reason I move to ruff was to get rid of flake8. It should be enforced. There's a pragma in flake8 to ignore long lines if you really need it.

charliermarsh commented 1 year ago

We could consider adding a strict mode for this rule that ignores all the stated exemptions. But I think it would need to be all-or-nothing (so, e.g., when that setting is enabled, we'd also flag URLs that exceed the line length limit, which are allowed right now).

T-256 commented 1 year ago

Ignores lines that consist of a single "word" (i.e., without any whitespace between its characters).

I agree with it, but whitespace is not only word separator. In mentioned example in description, I see +10 words. IMO something like str.isidentifier() check in python could be happened instead.

alanwilter commented 1 year ago

Reading E501 and I agree with the 3 exceptions. I just don't get why my original example is an exception.

charliermarsh commented 1 year ago

@alanwilter - It's because there's no whitespace anywhere on the line, so Ruff doesn't think it's splittable. I think we can do a better job of figuring out whether the line is splittable though. (That exemption wasn't designed for cases like the one in your example, I agree that it's not quite right.)

MichaReiser commented 1 year ago

Reading E501 and I agree with the 3 exceptions. I just don't get why my original example is an exception.

It's because the line contains no whitespace (not justifying that it is correct, but it is why it the reason why the line isn't flagged):

Ignores lines that consist of a single "word" (i.e., without any whitespace between its characters).

alanwilter commented 1 year ago

Ignores lines that consist of a single "word" (i.e., without any whitespace between its characters).

I agree with it, but whitespace is not only word separator. In mentioned example in description, I see +10 words. IMO something like str.isidentifier() check in python could be happened instead.

Ah, ok, thanks @charliermarsh. So now I agree with @T-256. IIRC, ruff knows about commented python code lines (ERA001), so it should know when a long line is a python code too.

T-256 commented 1 year ago

ruff knows about commented python code lines (ERA001), so it should know when a long line is a python code too.

This check is unnecessary for this rule.

https://github.com/astral-sh/ruff/blob/1642f4dbd9d4ff04e50d08b88eebbbef6e4cc077/crates/ruff_linter/src/rules/pycodestyle/overlong.rs#L55-L59 Instead here should split by word separators instead of only whitespace. Sample word separators got from default VSCode settings:

`~!@#$%^&*()-=+[{]}\|;:'",.<>/?

charliermarsh commented 1 year ago

I'm open to something like that, though we may need to see how it affects existing projects (via our ecosystem checks, which run automatically on every PR).

astral-sh / ruff

Consider non-whitespace split points for line-too-long exemptions #8383