PyCQA / flake8

flake8 is a python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
https://flake8.pycqa.org
Other
3.47k stars 310 forks source link

Performance suggestion: do not run unselected plugins/checks #751

Open asottile opened 3 years ago

asottile commented 3 years ago

In GitLab by @hugovk on Jun 5, 2020, 01:45

Please read this brief portion of documentation before going any further: http://flake8.pycqa.org/en/latest/internal/contributing.html#filing-a-bug

Please describe how you installed Flake8

$ pip install -U flake8
$ brew install flake8
# etc.

Please provide the exact, unmodified output of flake8 --bug-report

{
  "dependencies": [],
  "platform": {
    "python_implementation": "CPython",
    "python_version": "3.8.3",
    "system": "Darwin"
  },
  "plugins": [
    {
      "is_local": false,
      "plugin": "flake8_2020",
      "version": "1.6.0"
    },
    {
      "is_local": false,
      "plugin": "mccabe",
      "version": "0.6.1"
    },
    {
      "is_local": false,
      "plugin": "pycodestyle",
      "version": "2.6.0"
    },
    {
      "is_local": false,
      "plugin": "pyflakes",
      "version": "2.2.0"
    }
  ],
  "version": "3.8.2"
}

Please describe the problem or feature

I noticed that Flake8 takes the same time to run with --select as without. As shown using -vv verbosity, it runs all the plugins and checks regardless of --select, and only reports the selected ones afterwards.

Flake8 can sometimes take a long time to run on large codebases, and if it was possible to only run the selected checks, that would save a lot of time, CPU and power.

Would it be possible to only run selected checks/plugins? Rather than running them anyway and discarding that work when reporting?


Docs

For reference, my emphasis.

flake8 --help says --select is for which ones to enable:

  --select errors       Comma-separated list of errors and warnings to enable. For example, ``--select=E4,E51,W234``.
                        (Default: ['E', 'F', 'W', 'C90'])

The docs are a bit more explicit:

Specify the list of error codes you wish Flake8 to report.

https://flake8.pycqa.org/en/latest/user/options.html#cmdoption-flake8-select


Example

An example running on the TensorFlow codebase:

$ time flake8
...
flake8  323.91s user 4.31s system 98% cpu 5:32.78 total
$ time flake8 --select YTT
...
flake8 --select YTT  318.62s user 3.80s system 99% cpu 5:25.51 total

Both about the same, around 5m20s.

With an ugly hack (I know this mixes plugin names with error codes, but it's just to get a rough idea, and there's other places to skip too):

diff --git a/src/flake8/checker.py b/src/flake8/checker.py
index d993cb9..9ed986d 100644
--- a/src/flake8/checker.py
+++ b/src/flake8/checker.py
@@ -486,6 +486,8 @@ class FileChecker(object):
             return

         for plugin in self.checks["ast_plugins"]:
+            if plugin["name"] != "YTT":
+                continue
             checker = self.run_check(plugin, tree=ast)
             # If the plugin uses a class, call the run method of it, otherwise
             # the call should return something iterable itself
$ time flake8 --select YTT
flake8 --select YTT  276.90s user 3.17s system 98% cpu 4:43.00 total

About 4m30s, nearly a minute and ~13% faster.

asottile commented 3 years ago

In GitLab by @sigmavirus24 on Jun 5, 2020, 06:01

This would break our verbose output that tells people how many errors were ignored and not reported. Also there are nuanced ways to ignore codes so this isn't feasible to skip things. Some plugins register just a prefix and we'd have no way of skipping a sub error code check, especially depending on how the plug-in is written

asottile commented 3 years ago

In GitLab by @sigmavirus24 on Jun 5, 2020, 15:59

Perhaps the better way to do this is to have a --disable-extensions option because relying on --select is too fraught

asottile commented 3 years ago

In GitLab by @andersk on Feb 13, 2021, 12:57

pycodestyle can do this and save significant time. So surely Flake8 ought to be able to do it too, at least for some checks including the pycodestyle ones, when verbose output is not requested.

$ git clone https://github.com/zulip/zulip.git

$ cd zulip; rm setup.cfg

$ time pycodestyle -qq --count .
15849

real    0m22.806s
user    0m22.759s
sys     0m0.020s

$ time pycodestyle -qq --select=E265 --count .
4

real    0m9.721s
user    0m9.680s
sys     0m0.030s

$ time flake8 -j1 -qq --count .
15831

real    0m50.552s
user    0m50.281s
sys     0m0.213s

$ time flake8 -j1 -qq --select=E265 --count .
4

real    0m50.434s
user    0m50.177s
sys     0m0.195s
chr1st1ank commented 2 years ago

This is not only a performance optimization but also a stability improvement. What you don't run can't break. Flake8's plugin discovery can break a CI pipeline at any time when dependencies are updated, because some of the plugin libraries may change their behaviour or if something unexpected is in the importpath. If one can exactly specify which to run, this reduces the chance of such surprises. Examples:

There may be better examples, these might partially be debatable, but the problem class definitely exists in the deep fires of Python's dependency hell.