glob patterns should not be overriden by specific --per-file-ignores rules #670

Open asottile opened 3 years ago

In GitLab by @qxv0 on Jan 30, 2019, 06:59

flake8 --bug-report:

{
  "dependencies": [
    {
      "dependency": "entrypoints",
      "version": "0.3"
    }
  ],
  "platform": {
    "python_implementation": "CPython",
    "python_version": "3.7.2",
    "system": "Windows"
  },
  "plugins": [
    {
      "is_local": false,
      "plugin": "ProxyChecker",
      "version": "0.0.1"
    },
    {
      "is_local": false,
      "plugin": "flake8-comprehensions",
      "version": "1.4.1"
    },
    {
      "is_local": false,
      "plugin": "flake8-docstrings",
      "version": "1.3.0, pydocstyle: 3.0.0"
    },
    {
      "is_local": false,
      "plugin": "flake8-future-import",
      "version": "0.4.5"
    },
    {
      "is_local": false,
      "plugin": "flake8-mock",
      "version": "0.3"
    },
    {
      "is_local": false,
      "plugin": "flake8-no-u-prefixed-strings",
      "version": "0.2"
    },
    {
      "is_local": false,
      "plugin": "flake8-print",
      "version": "3.1.0"
    },
    {
      "is_local": false,
      "plugin": "flake8-string-format",
      "version": "0.2.3"
    },
    {
      "is_local": false,
      "plugin": "flake8-tuple",
      "version": "0.2.13"
    },
    {
      "is_local": false,
      "plugin": "flake8_coding",
      "version": "1.3.1"
    },
    {
      "is_local": false,
      "plugin": "flake8_quotes",
      "version": "1.0.0"
    },
    {
      "is_local": false,
      "plugin": "hacking.core",
      "version": "0.0.1"
    },
    {
      "is_local": false,
      "plugin": "mccabe",
      "version": "0.6.1"
    },
    {
      "is_local": false,
      "plugin": "naming",
      "version": "0.8.1"
    },
    {
      "is_local": false,
      "plugin": "pycodestyle",
      "version": "2.5.0"
    },
    {
      "is_local": false,
      "plugin": "pyflakes",
      "version": "2.1.0"
    }
  ],
  "version": "3.7.1"
}

Say we want to ignore D102 errors in all files within the tests directory. We can add a rule for each file individually, but it would be more efficient if we could do that just by adding tests/* : N813 to the config file (similar to the way flake8-per-file-ignores plugin works on flake8<3.7.0). see the comments below

In GitLab by @asottile on Jan 30, 2019, 07:30

It already works for me (?) do you have a particular repository this isn't working in?

$ git diff
diff --git a/tests/unit/test_git.py b/tests/unit/test_git.py
index 24d4b5b..b78f9fb 100644
--- a/tests/unit/test_git.py
+++ b/tests/unit/test_git.py
@@ -1,4 +1,5 @@
 """Tests around functionality in the git integration."""
+import os
 import mock
 import pytest

diff --git a/tox.ini b/tox.ini
index 5d94e5c..111f704 100644
--- a/tox.ini
+++ b/tox.ini
@@ -172,3 +172,5 @@ max-complexity = 10
 import-order-style = google
 application-import-names = flake8
 format = ${cyan}%(path)s${reset}:${yellow_bold}%(row)d${reset}:${green_bold}%(col)d${reset}: ${red_bold}%(code)s${reset} %(text)s
+per-file-ignores:
+    tests/*: F401

$ flake8 tests
$ flake8  --version
3.7.0 (mccabe: 0.6.1, pycodestyle: 2.5.0, pyflakes: 2.1.0) CPython 3.6.7 on Linux

In GitLab by @qxv0 on Jan 30, 2019, 08:05

I checked and it seems to me that the problem occurs only in those files which also have another specific ignore rule.

For example if I have both tests/*: X1 and tests/a.py:Y2, then only Y2 errors are ignored on tests/a.py, but X1 errors show up which is unexpected.

In GitLab by @qxv0 on Jan 30, 2019, 08:08

changed title from {-Allow glob patterns in paths of --per-file-ignor-}es to {+glob patterns should not be overriden by specific --per-file-ignores rul+}es

In GitLab by @qxv0 on Jan 30, 2019, 08:08

changed the description

In GitLab by @asottile on Jan 30, 2019, 08:16

hmmm so the way it's implemented is the first rule that matches a filename is used. In your case you can write:

per-file-ignores =
    tests/a.py: Y2, X1
    tests/*: X1

In GitLab by @oseiberts11 on Feb 6, 2019, 02:12

I seem to have a similar problem (3.7.5). In my per-file-ignores I have (among other things but in this order):

per-file-ignores =
    machinedb/**/*.py: D101, D102, D103
    machinedb/models/inventory.py: W503

First of all, this particular glob pattern doesn't seem to work any more.

Even if I expand this to

    machinedb/*:D101,D102,D103
    machinedb/*/*:D101,D102,D103
    machinedb/models/sshca.py: W503

then I still get D* errors for sshca.py:

machinedb/models/sshca.py:108:1: D102 Missing docstring in public method

Copying D101, D102, D103 to all the more specific patterns is of course not very nice. Additionally the glob machinedb/**/*.py is no longer working.

In GitLab by @sigmavirus24 on Feb 6, 2019, 04:57

@oseiberts11 so to be clear, it sounds like what you expect to happen is that

machinedb/**/*.py: D101, D102, D103

To ignore those three errors in every file matching that glob and then you seem to want to extend the list of ignores for those files for specific files, e.g., machinedb/models/inventory with the errors specified for each of those files. Am I understanding you correctly?

In GitLab by @oseiberts11 on Feb 6, 2019, 07:26

Exactly. That is how it seemed to work with the plugin. (I didn't write the configuration myself, I just noticed that behaviour seems to have changed).

In GitLab by @sigmavirus24 on Feb 6, 2019, 17:26

Right, the plugin took a different tact, honestly, that would be harder to replicate here. Basically, it iterated through everything for matches and applied all of them. We could do that, but then we're favoring exact backwards compatibility (which wasn't ever the goal) over simplicity. It would also mean that we're iterative over a list every time rather than thinking through things more simply, and it makes debugging the option harder in the future, i.e., we always prefer the most specific rule there even if multiple might match.

In GitLab by @oseiberts11 on Feb 7, 2019, 01:11

I suspected something like that.

I see a potential solution for people who do want exact backwards compatibility. Maybe you could make it so that if the flake8-per-file-ignores plugin is selected, then the built-in functionality is disabled, and the plugin is used instead. Probably with a warning of some sort which points out that the plugin has mostly but not exactly the same functionality. Then people who are upgrading automatically and unknowingly depend on some old behaviour (I was in that situation) are not affected by any change. When they are actively changing the config for the new situation, they will notice the change, but that is a much better time for it because they are actively fixing things anyway.

I realise that this is pretty much the opposite of how the plugin is handled now, but does that sound doable?

In GitLab by @sigmavirus24 on Feb 9, 2019, 05:46

The plugin's page says it's deprecated, I'd rather not encourage folks to use bit-rotting software. Is it too arduous to

    machinedb/**/*.py: D101, D102, D103
    machinedb/models/inventory.py: W503, D101, D102, D103

In GitLab by @oseiberts11 on Feb 15, 2019, 02:57

For this example with only 2 lines, that would work well enough. But in practice I have several patterns, and several files with additional ignores.

From what I've seen above, I understand that the way you want it, is to select a single line, without scanning several or all of them. I fear that in the general case this isn't possible. If there are several wildard patterns that match a particular file name, which one are you going to choose? Why?

And in the general case, you cannot just point to "the best pattern" without examining several of them. To give a simple example, with patterns like

a/b/c/*
a/b/*/d
a/*/c/d
*/b/c/d
a/b/*/*
a/*/c/*
*/*/*/d

and file names like a/b/c/d, a/x/c/d, a/b/x/d, the names will match multiple patterns, and/or they will have different "best" patterns. You cannot (in an obviously understandable way) say ahead of time which pattern is "best". Even if you manage to create a total ordering, for many potential file names you will need to try several patterns anyway until you find a match.

Since trying to explain to a user how this works is too complicated (and I'm not even talking about implementing it), just scanning all patterns and applying all matches is much easer to understand, and much easier to implement.

(There seems to be a bug in the implementation already, viz. that the machinedb/**/*.py pattern didn't seem to match any files at all for me. So I'm pessimistic about more complex cases.)

In GitLab by @asottile on Feb 15, 2019, 03:48

I don't think it's hard to explain, the first matching pattern wins right?

In GitLab by @asottile on Feb 15, 2019, 03:57

Also regarding the ** glob -- that's because flake8 uses fnmatch globs and not glob globs, so you'd want just machinedb/*.py

In GitLab by @sigmavirus24 on Feb 15, 2019, 04:15

It's the last match sorted alphabetically

https://gitlab.com/pycqa/flake8/blob/dc7b082b96751c8ef7bfc616f8701d432afa7ef7/src/flake8/style_guide.py#L369-374

So it's always consistent and reproducible.

If we wanted to throw people under the bus and give them a foot-gun at the same time, we could pretty easily just apply every thing, but then things they didn't intend to match end up hiding problems from them they didn't want. This is precisely why I had no original intention to support globbing. It introduces far too much unnecessary complexity and no one will be happy no matter how it's implemented.

In GitLab by @bob.hyman on May 2, 2020, 16:34

First let's document this, to help save the next developer a day lost to learning what to expect. Here's a draft, please comment:

(In --per-file-ignores):

If multiple patterns match a given file, a single pattern is chosen and the ignores from just that rule are applied to the file. The pattern chosen is the alphabetically last pattern which matches the file. This is often the most specific pattern that matched.

Is this correct?

I'd really like to see merging of the excludes from multiple patterns that match.

Currently I have this:

per-file-ignores =
    # allow nicely aligned parametrizations
    indico/*_test.py:E241
    # models with hybrids
    indico/*/models/*.py:N805

so in any unit test files I want to disable E241, and in any models I want to disable N805 (due to sqlalchemy hybrid property expression classmethods). Guess where one of the tests that need E241 disabled is...

Quoting from https://github.com/PyCQA/flake8/issues/670#issuecomment-812819033:

(In --per-file-ignores):

If multiple patterns match a given file, a single pattern is chosen and the ignores from just that rule are applied to the file. The pattern chosen is the alphabetically last pattern which matches the file. This is often the most specific pattern that matched.

I'd also like to see this https://github.com/PyCQA/flake8/issues/670#issuecomment-812819024 too in the documentation:

Also regarding the ** glob -- that's because flake8 uses fnmatch globs and not glob globs, so you'd want just machinedb/*.py

I didn't loose a day, but a good two hours for sure. Especially given that the first result in per-file-ignores is https://pypi.org/project/flake8-per-file-ignores/ (which paints a different image).

PyCQA / flake8

glob patterns should not be overriden by specific --per-file-ignores rules #670