Closed ThomasLaPiana closed 1 year ago
@daveqnet do you have strong opinions here? Have you ever heard of bandit
?
likewise @sanders41 I'm wondering if you've used any "security as CI check" python-specific tools before, maybe have alternative suggestions?
Hi @ThomasLaPiana, I'm not familiar with bandit
so don't have a strong opinion of it as a tool, but would always be in more favor of more SAST in the pipeline as long as it's (i) quick and (ii) doesn't flag too many false positives.
Generally it's easier to introduce something like SAST as a weekly scheduled job to see how it performs before applying it per-PR/-push in CI.
If you're interested, CodeQL is already in place as SAST if you want to compare it:
I don't have other python-specific suggestions, but Snyk Code and Semgrep CLI are probably the two fast SAST tools that are on my radar. There are tonnes of legacy, slow SAST tools out there (Checkmarx, Veracode, etc.).
Semgrep in particular seems to have really taken off in 2022 because it's so flexible. Here's a comparison of it and bandit
(written by the Semgrep devs, so take it with a pinch of salt): https://r2c.dev/blog/2021/python-static-analysis-comparison-bandit-semgrep/
Bandit is the only one I know of. I looked into using it in the past and decided against it. At the time people were saying there were too many false positives for it to be useful. It has been a few years so maybe it has improved since then.
False positives are always the problem with SAST, typically 80-90% of findings. 🫤
Because I was curious on my day off, I took a quick look. bandit
has about five thousand findings to investigate.
~/projects/github/ethyca/fides main ❯ git log -1 | head -n 3
commit 2c2b50c324e6420bfd54cbe462f326ca46b80865
Author: Neville Samuell <neville@ethyca.com>
Date: Mon Oct 31 05:40:11 2022 -0400
~/projects/github/ethyca/fides main ❯ bandit -r .
[main] INFO profile include tests: None
[main] INFO profile exclude tests: None
[main] INFO cli include tests: None
[main] INFO cli exclude tests: None
[main] INFO running on Python 3.10.8
2107 [0.. 50.. 100.. 150.. 200.. 250.. 300.. 350.. 400.. 450.. 500.. 550.. 600.. 650.. 700.. 750.. 800.. 850.. 900.. 950.. 1000.. 1050.. 1100.. 1150.. 1200.. 1250.. 1300.. 1350.. 1400.. 1450.. 1500.. 1550.. 1600.. 1650.. 1700.. 1750.. 1800.. 1850.. 1900.. 1950.. 2000.. 2050.. 2100.. ]
Run started:2022-10-31 14:16:55.058129
Test results:
<findings redacted by Dave>
Code scanned:
Total lines of code: 520210
Total lines skipped (#nosec): 0
Run metrics:
Total issues (by severity):
Undefined: 0
Low: 5517
Medium: 80
High: 25
Total issues (by confidence):
Undefined: 0
Low: 24
Medium: 91
High: 5507
Files skipped (0):
Time taken: ~30 seconds
~/projects/github/ethyca/fides main ❯ git log -1 | head -n 3
commit 2c2b50c324e6420bfd54cbe462f326ca46b80865
Author: Neville Samuell <neville@ethyca.com>
Date: Mon Oct 31 05:40:11 2022 -0400
~/projects/github/ethyca/fides main ❯ semgrep --config "p/python"
METRICS: Using configs from the Registry (like --config=p/ci) reports pseudonymous rule metrics to semgrep.dev.
To disable Registry rule metrics, use "--metrics=off".
Using configs only from local files (like --config=xyz.yml) does not enable metrics.
More information: https://semgrep.dev/docs/metrics
Semgrep rule registry URL is https://semgrep.dev/registry.
Scanning 392 files with 139 python rules.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|392/392 tasks
<findings redacted by Dave>
Some files were skipped or only partially analyzed.
Scan was limited to files tracked by git.
Scan skipped: 260 files matching .semgrepignore patterns
For a full list of skipped files, run semgrep with the --verbose flag.
Ran 139 rules on 392 files: 15 findings.
Time taken: ~12 seconds
~/projects/github/ethyca/fides main ❯ git log -1 | head -n 3
commit 2c2b50c324e6420bfd54cbe462f326ca46b80865
Author: Neville Samuell <neville@ethyca.com>
Date: Mon Oct 31 05:40:11 2022 -0400
~/projects/github/ethyca/fides main ❯ snyk code test
Testing /Users/dave/projects/github/ethyca/fides ...
<findings redacted by Dave>
✔ Test completed
Organization: <redacted>
Test type: Static code analysis
Project path: /Users/dave/projects/github/ethyca/fides
Summary:
39 Code issues found
2 [High] 11 [Medium] 26 [Low]
Time taken: ~30 seconds
@daveqnet given your findings, do you recommend we choose one or do you feel our current measures are good enough?
@ThomasLaPiana Short answer: I think CodeQL is fine for now.
Longer answer: Bandit has way too many false positives (5k!), Semgrep intrigues me and I'd like to experiment with it medium-term, but in the short term Snyk is easier to introduce and manage - I'll add you and @sanders41 as watchers on the internal DevSecOps Jira ticket tracking it (OPS-168).
sounds good, thank you @daveqnet ! Closing in favor of that ticket for now
We can reopen when we decide specifically which tool to implement
Is your feature request related to a specific problem?
More security is good
Describe the solution you'd like
Adding an automated, common, open-source tool like bandit