[BUG] Detect secrets does not apply exclude lines filter from baseline file

cguest97 commented 1 year ago

Describe the bug Detect secrets does not seem to be applying the filter set by .exclude.lines found in the baseline file. This leads to findings appearing which should be caught by this rule. Setting the value through the CLI flag --exclude-lines behaves as expected and the findings are filtered out.

To Reproduce Steps to reproduce the behavior:

Create a new empty folder
Create a file containing the line secret: mysecret
Run detect secrets as follows: detect-secrets scan --update .secrets.baseline --exclude-lines "secret" --all-files
View the baseline file, no results should be found and the value of .exclude.lines should equal "secret:"
Run detect secrets again as follows: detect-secrets scan --update .secrets.baseline --all-files
Reload the baseline file and observe that results is no longer empty

Expected behavior After running detect-secrets with the --exclude-lines flag, running the scan again without this flag should result in the same baseline file being generated.

Screenshots

Impact This prevents us using the tool in CI as our production repo contains too many findings to audit by hand without being able to exclude specific keywords from the scan.

Additional context:

OS: Windows 11 Enterprise (Version 10.0.22621 Build 22621), Ubuntu 22.04.02 LTSR
Python Version: 3.11.2
Detect Secrets Version: 0.13.1+ibm.61.dss
Languages Scanned: .txt, .go, .yaml

edmondsw commented 1 year ago

I'm also seeing this with 0.13.1+ibm.60.dss

bigpick commented 9 months ago

Using local build of latest 0.13.1+ibm.62.dss,

mkdir issue132
echo 'secret: mysecret' > issue132/test.txt

# detect-secrets scan --update .secrets.baseline --exclude-lines "secret" --all-files issue132
PYTHONPATH=`pwd` python detect_secrets/main.py scan --update .secrets.baseline --exclude-lines "secret" --all-files issue132 

# detect-secrets audit .secrets.baseline
PYTHONPATH=`pwd` python detect_secrets/main.py audit .secrets.baseline
Nothing to audit!

Has a .secrets.baseline with (jq '.exclude' .secrets.baseline):

{
  "files": "^.secrets.baseline$",
  "lines": "secret"
}

Then running the scan again, this time ommitting the --exclude-lines argument, the .secrets.baseline still has the correct lines being excluded, however, it doesn't seem to actually be using it, since the audit now finds the secret:

PYTHONPATH=`pwd` python detect_secrets/main.py scan --update .secrets.baseline --all-files issue132

PYTHONPATH=`pwd` python detect_secrets/main.py audit .secrets.baseline

Secret:      1 of 1
Filename:    issue132/test.txt
Secret Type: Secret Keyword
----------
 Warning: ibm_db is not installed, the DB2 plugin will not run.
 To enable the optional DB2 plugin, install ibm_db with:
 pip install ibm_db
 and run detect secrets with detect-secrets scan --update .secrets.baseline --db2-scan
 Warning: ibm_db is not installed, the DB2 plugin will not run.
 To enable the optional DB2 plugin, install ibm_db with:
 pip install ibm_db
 and run detect secrets with detect-secrets scan --update .secrets.baseline --db2-scan
1:secret: mysecret
----------
A potential secret was detected in this code. If so, you should select "yes" below to mark it as an actual secret, and remediate it.
Once the secret has been removed from the file, and another scan has been run, its entry will be removed from the baseline file.
----------
Is this actually a secret? i.e. not a false-positive (y)es, (n)o, (s)kip, (q)uit:

And yet, jq '.exclude' .secrets.baseline:

{
  "files": "^.secrets.baseline$",
  "lines": "secret"
}

bigpick commented 9 months ago

This seems to be occuring because plugins are always run with "fresh" (i.e only CLI args) settings, https://github.com/IBM/detect-secrets/blob/d7c79926b0923c3d97cc1a66db6ed18b847f9d96/detect_secrets/main.py#L39-L47

e.g. copying the logic that patches the exclude_lines from a prior baseline to the cli arg right before this block, the scan now doesn't detect the secret line:

+        old_baseline = _get_existing_baseline(args.import_filename)
+        if not args.exclude_lines and old_baseline.get('exclude'):
+            args.exclude_lines = old_baseline['exclude']['lines']
+
         # Plugins are *always* rescanned with fresh settings, because 
         # we want to get the latest updates. 
         plugins = initialize.from_parser_builder( 
             args.plugins, 
             exclude_lines_regex=args.exclude_lines, 
             automaton=automaton, 
             should_verify_secrets=not args.no_verify, 
             plugin_filenames=args.plugin_filenames, 
         )

bigpick commented 9 months ago

Maybe the best way to "fix" this without breaking backwards compatibility would be to introduce a new CLI argument, something like --plugins-reuse-exclude, and in that case, the plugins would, on an existing secrets baseline, attempt to instantiate themselves with the existing .exclude.lines values?

I can throw something up for the above if sounds copacetic.

cguest97 commented 8 months ago

Maybe the best way to "fix" this without breaking backwards compatibility would be to introduce a new CLI argument, something like --plugins-reuse-exclude, and in that case, the plugins would, on an existing secrets baseline, attempt to instantiate themselves with the existing .exclude.lines values?

I can throw something up for the above if sounds copacetic.

This sounds like a reasonable suggestion to avoid backwards compatibility issues. One addendum is that we are unable to set CLI flags directly when running detect-secrets through our CI, all our configuration comes directly from the baseline. Therefore if the --plugins-reuse-exclude flag is used when generating a baseline we would need the baseline to then persist that option so future scans would continue to respect this setting even if not explicitly set through the CLI.

bigpick commented 7 months ago

all our configuration comes directly from the baseline. Therefore if the --plugins-reuse-exclude flag is used when generating a baseline we would need the baseline to then persist that option so future scans would continue to respect this setting even if not explicitly set through the CLI.

The problem with this I think then becomes: How does it ever get unset?

e.g. Once you run a scan with --plugins-reuse-excludes, it would add "plugins-reuse-excludes": true to the .secrets.baseline file. If you then want scan to inherently reuse that (even if you're not passing as an arg), you're effectively never going to have it not run with it, unless you manually remove the field from the .secrets.baseline?

bigpick commented 7 months ago

(I opened the above PR assuming the above behavior is fine)

cguest97 commented 7 months ago

all our configuration comes directly from the baseline. Therefore if the --plugins-reuse-exclude flag is used when generating a baseline we would need the baseline to then persist that option so future scans would continue to respect this setting even if not explicitly set through the CLI.

The problem with this I think then becomes: How does it ever get unset?

e.g. Once you run a scan with --plugins-reuse-excludes, it would add "plugins-reuse-excludes": true to the .secrets.baseline file. If you then want scan to inherently reuse that (even if you're not passing as an arg), you're effectively never going to have it not run with it, unless you manually remove the field from the .secrets.baseline?

Agreed, this seems to be an issue with other flags too. For example updating the baseline while using --no-box-scan correctly disables the plugin, running the scan again without this flag set still has this plugin turned off. The only ways to re-enable the plugin are to either use the --use-all-plugins flag to turn everything back on, or to regenerate the baseline.

From my POV I think it's fine to keep this set in the baseline, but maybe we should log that this option is set during the scan?

IBM / detect-secrets

[BUG] Detect secrets does not apply exclude lines filter from baseline file #132