NASA-AMMOS / slim

Software Lifecycle Improvement & Modernization
https://nasa-ammos.github.io/slim/
Apache License 2.0
24 stars 9 forks source link

[Bug]: `docs/guides/software-lifecycle/security/secrets-detection/README.md` #122

Closed nutjob4life closed 3 months ago

nutjob4life commented 8 months ago

Checked for duplicates

Yes - I've already checked

Website or Best Practice Guide?

Best Practice Guide

Describe the bug

When I tried integrating NASA-AMMOS/slim-detect-secrets into a PDS Engineering Node project, I noticed that the instructions in this guide kept treating the detect-secrets command's --exclude-files argument as a "glob" style file patterns:

… --exclude-files '.secrets.*' --exclude-files '.git*' …

However, when I run detect-secrets scan --help, the usage says:

--exclude-files EXCLUDE_FILES
                      If filenames match this regex, it will be ignored.

which says it's a regex, not a glob.

So, diving into the code:

def should_exclude_file(filename: str) -> bool:
    regexes = _get_file_exclusion_regex()
    for regex in regexes:
        if regex.search(filename):
            return True
    return False

And sure enough .search() is a function on a regex object, not a shell glob.

What did you expect?

I expected the docs to use regexes, not globs:

--exclude-files '.secrets.*' --exclude-files '.git*'

should become

--exclude-files '\.secrets\..*' --exclude-files '\.git.*'

The same should be applied to the args in the pre-commit config later in the docs.

Reproducible steps

$ detect-secrets scan --help

Environment

- `slim-detect-secrets` @ 91e097ad4559ae6ab785c883dc5ed989202c7fbe
- Python 3.11
- macOS 14.0
riverma commented 8 months ago

@nutjob4life - thank you for testing out our guide and for finding this bug! Appreciate you taking the time to report on this.

I just tested out a test scan on a repository using the three approaches below:

  1. detect-secrets scan ./ --all-files --disable-plugin AbsolutePathDetectorExperimental --exclude-files '.secrets.*' --exclude-files '.git*' > .secrets.baseline.exclude-glob
  2. detect-secrets scan ./ --all-files --disable-plugin AbsolutePathDetectorExperimental --exclude-files '\.secrets\..*' --exclude-files '\.git.*' > .secrets.baseline.exclude-regex
  3. detect-secrets scan ./ --all-files --disable-plugin AbsolutePathDetectorExperimental > .secrets.baseline.no-exclude

I then ran the following command against each of the .secrets... files above: cat .secrets.baseline | grep ".git". Here's what I saw:

So, it seems in terms of the guide, the pattern --exclude-files '.secrets.*' --exclude-files '.git*' is functionally equivalent for my repository compared to --exclude-files '\.secrets\..*' --exclude-files '\.git.*'. This is a special case most likely.

@nutjob4life - did you see have a use case where you tried to include other exclude-filesclauses and you had to use the RegEx approach to make it work? Also - if you have the time, would you be interested in proposing a PR for this fix in the guide? Would be great to see you have commit credit for your find!

@perryzjc - I think @nutjob4life's suggestion is spot on and we may need an update to the guide to reflect. Probably a clarification about glob vs regex. Thoughts?

nutjob4life commented 8 months ago

Keeping glob patterns when the --help and the implementation itself clearly uses regexes is only going to cause future confusion. I'm +1 for updating the guide.

Here's what I ended up using for Python repositories:

detect-secrets scan . \
    --all-files \
    --disable-plugin AbsolutePathDetectorExperimental \
    --exclude-files '\.secrets..*' \
    --exclude-files '\.git.*' \
    --exclude-files '\.mypy_cache' \
    --exclude-files '\.pytest_cache' \
    --exclude-files '\.tox' \
    --exclude-files '\.venv' \
    --exclude-files 'venv' \
    --exclude-files 'dist' \
    --exclude-files 'build' \
    --exclude-files '.*\.egg-info' > .secrets.baseline

And for Maven:

detect-secrets scan . \
    --all-files \
    --disable-plugin AbsolutePathDetectorExperimental \
    --exclude-files '\.secrets..*' \
    --exclude-files '\.git.*' \
    --exclude-files 'target' > .secrets.baseline

Although I'm still not happy with either of these. For example, suppose you've got a Python repository with

📁 my.package
   📁 src
      📁 my
         📁 package
            🗒️ __init__.py
            📁 accessory
            📁 btree
            📁 dist
            📁 parser

or a Maven package with

📁 stem-analyzer
   📁 src
      📁 main
         📁 java
            📁 org
               📁 forestry
                  📁 stem
                     📁 analysis
                        📁 chooser
                        📁 multiplexer
                        📁 target
                        📁 yamlparser

Then in the Python case, the module my.package.dist gets ignored and in the Maven case the module org.forestry.stem.analysis.target gets ignored.

On top of that, the command-line argument is --exclude-files which suggests it's just files that gets excluded, when it's files and directories. But that's Yelp's fault 😝

riverma commented 4 months ago

@nutjob4life - added your contribution to this PR: https://github.com/NASA-AMMOS/slim/pull/143

See this line. Also your ack.

Thanks again!

nutjob4life commented 4 months ago

Wooot! Thanks @riverma! 🥂

riverma commented 3 months ago

Resolved with https://github.com/NASA-AMMOS/slim/pull/143