ethyca / fides

The Privacy Engineering & Compliance Framework
https://ethyca.com/docs
Apache License 2.0
353 stars 72 forks source link

Add `bandit` to our static/CI checks for increased security #1619

Closed ThomasLaPiana closed 1 year ago

ThomasLaPiana commented 1 year ago

Is your feature request related to a specific problem?

More security is good

Describe the solution you'd like

Adding an automated, common, open-source tool like bandit

ThomasLaPiana commented 1 year ago

@daveqnet do you have strong opinions here? Have you ever heard of bandit?

likewise @sanders41 I'm wondering if you've used any "security as CI check" python-specific tools before, maybe have alternative suggestions?

daveqnet commented 1 year ago

Hi @ThomasLaPiana, I'm not familiar with bandit so don't have a strong opinion of it as a tool, but would always be in more favor of more SAST in the pipeline as long as it's (i) quick and (ii) doesn't flag too many false positives.

Generally it's easier to introduce something like SAST as a weekly scheduled job to see how it performs before applying it per-PR/-push in CI.

If you're interested, CodeQL is already in place as SAST if you want to compare it:

I don't have other python-specific suggestions, but Snyk Code and Semgrep CLI are probably the two fast SAST tools that are on my radar. There are tonnes of legacy, slow SAST tools out there (Checkmarx, Veracode, etc.).

Semgrep in particular seems to have really taken off in 2022 because it's so flexible. Here's a comparison of it and bandit (written by the Semgrep devs, so take it with a pinch of salt): https://r2c.dev/blog/2021/python-static-analysis-comparison-bandit-semgrep/

sanders41 commented 1 year ago

Bandit is the only one I know of. I looked into using it in the past and decided against it. At the time people were saying there were too many false positives for it to be useful. It has been a few years so maybe it has improved since then.

daveqnet commented 1 year ago

False positives are always the problem with SAST, typically 80-90% of findings. 🫤

daveqnet commented 1 year ago

Because I was curious on my day off, I took a quick look. bandit has about five thousand findings to investigate.

bandit

~/projects/github/ethyca/fides main ❯ git log -1 | head -n 3
commit 2c2b50c324e6420bfd54cbe462f326ca46b80865
Author: Neville Samuell <neville@ethyca.com>
Date:   Mon Oct 31 05:40:11 2022 -0400
~/projects/github/ethyca/fides main ❯ bandit -r .
[main]  INFO    profile include tests: None
[main]  INFO    profile exclude tests: None
[main]  INFO    cli include tests: None
[main]  INFO    cli exclude tests: None
[main]  INFO    running on Python 3.10.8
2107 [0.. 50.. 100.. 150.. 200.. 250.. 300.. 350.. 400.. 450.. 500.. 550.. 600.. 650.. 700.. 750.. 800.. 850.. 900.. 950.. 1000.. 1050.. 1100.. 1150.. 1200.. 1250.. 1300.. 1350.. 1400.. 1450.. 1500.. 1550.. 1600.. 1650.. 1700.. 1750.. 1800.. 1850.. 1900.. 1950.. 2000.. 2050.. 2100.. ]
Run started:2022-10-31 14:16:55.058129

Test results:

<findings redacted by Dave>

Code scanned:
    Total lines of code: 520210
    Total lines skipped (#nosec): 0

Run metrics:
    Total issues (by severity):
        Undefined: 0
        Low: 5517
        Medium: 80
        High: 25
    Total issues (by confidence):
        Undefined: 0
        Low: 24
        Medium: 91
        High: 5507
Files skipped (0):

Time taken: ~30 seconds

semgrep

~/projects/github/ethyca/fides main ❯ git log -1 | head -n 3
commit 2c2b50c324e6420bfd54cbe462f326ca46b80865
Author: Neville Samuell <neville@ethyca.com>
Date:   Mon Oct 31 05:40:11 2022 -0400
~/projects/github/ethyca/fides main ❯ semgrep --config "p/python"
METRICS: Using configs from the Registry (like --config=p/ci) reports pseudonymous rule metrics to semgrep.dev.
To disable Registry rule metrics, use "--metrics=off".
Using configs only from local files (like --config=xyz.yml) does not enable metrics.

More information: https://semgrep.dev/docs/metrics

Semgrep rule registry URL is https://semgrep.dev/registry.
Scanning 392 files with 139 python rules.
  100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|392/392 tasks

<findings redacted by Dave>

Some files were skipped or only partially analyzed.
  Scan was limited to files tracked by git.
  Scan skipped: 260 files matching .semgrepignore patterns
  For a full list of skipped files, run semgrep with the --verbose flag.

Ran 139 rules on 392 files: 15 findings.

Time taken: ~12 seconds

snyk code

~/projects/github/ethyca/fides main ❯ git log -1 | head -n 3
commit 2c2b50c324e6420bfd54cbe462f326ca46b80865
Author: Neville Samuell <neville@ethyca.com>
Date:   Mon Oct 31 05:40:11 2022 -0400
~/projects/github/ethyca/fides main ❯ snyk code test

Testing /Users/dave/projects/github/ethyca/fides ...

<findings redacted by Dave>

✔ Test completed

Organization:      <redacted>
Test type:         Static code analysis
Project path:      /Users/dave/projects/github/ethyca/fides

Summary:

  39 Code issues found
  2 [High]   11 [Medium]   26 [Low] 

Time taken: ~30 seconds

ThomasLaPiana commented 1 year ago

@daveqnet given your findings, do you recommend we choose one or do you feel our current measures are good enough?

daveqnet commented 1 year ago

@ThomasLaPiana Short answer: I think CodeQL is fine for now.

Longer answer: Bandit has way too many false positives (5k!), Semgrep intrigues me and I'd like to experiment with it medium-term, but in the short term Snyk is easier to introduce and manage - I'll add you and @sanders41 as watchers on the internal DevSecOps Jira ticket tracking it (OPS-168).

ThomasLaPiana commented 1 year ago

sounds good, thank you @daveqnet ! Closing in favor of that ticket for now

We can reopen when we decide specifically which tool to implement