Open sk- opened 5 years ago
Well, you are asking an AST-based tool to analyse 1156 lines of string objects, producing a set with 6104 strings. That's going to take some time, about 65 seconds on my system. That's actually not that bad, at ~10ms per string.
The issue is really with that frozenset and how you construct it, in my opinion. I'd exclude that file from analysis, or move to a single string and split it at import time, e.g.
FREE_EMAIL_DOMAINS = frozenset("""
0-mail.com 007addict.com 020.co.uk 027168.com 0815.ru 0815.su
0clickemail.com 0sg.net 0wnd.net 0wnd.org 1033edge.com 10mail.org
...
zuvio.com zuzzurello.com zvmail.com zwallet.com zweb.in zxcv.com
zxcvbnm.com zybermail.com zydecofan.com zzn.com zzom.co.uk zzz.com
""".split())
which drops analysis time down to the 300ms range, and gives you a smaller .pyc cache file to boot (88K vs 115K), and is still fast to import.
Or, even better, move that list into a separate text file and load that at import time (with open(...) as domains: FREE_EMAIL_DOMAINS = frozenset(map(str.strip, domains))
). That makes it much easier to update later (just dump the raw gist into the text file).
I don't get why it could be too slow, as some other tools, which also analyze the AST, like Pylint, are much faster.
Also, there's no way to know which file is taking too long, nor which are the checks that are taking much of the time. So that one can disable those.
Creating the AST itself is not slow (it's done in C by the same code that compiles Python source code into bytecode), but you then need to traverse the resulting AST to implement the tool logic. PyLint does this differently from bandit.
The speed difference then lies there; PyLint is not interested in individual strings, but bandit is. Pylint is skipping those 6k string nodes, Bandit executes Python code for each, so that it can detect things like hardcoded passwords and incorrect bind parameters.
To find out which files are slow, don't use bandit's recursion. Use find
to call bandit
on each individual file, and include time
:
$find project_root -name '*.py' -exec echo {} \; -exec time bandit -q -lll {} \;
and you'll get the full filename echoed, followed by the bandit output (if any, -q
tells it to not output anything for files without matches) and how long bandit took to process that file.
My team has some files with 14000 entry dictionaries and these each take 30+ minutes to parse on my system which makes tool unusable. Must be a faster way to deal with such data.
Much of time spent is a result of Bandit supporting plugins that check string literals. Specifically there are 4 plugins that do this:
What could be done is to convert these plugins to check calls with arguments that are various hardcoded passwords, interfaces, etc.
Describe the bug Processing the following file is too slow, taking more than a minute.