IBM / detect-secrets

An enterprise friendly way of detecting and preventing secrets in code.
Apache License 2.0
73 stars 45 forks source link

[BUG] Regression in 0.13.1+ibm.62.dss resulting in results being wiped out from baseline file #148

Open mlucic opened 5 months ago

mlucic commented 5 months ago

Working in a new environment I had to do a fresh install of the detect-secrets CLI tool, which I did following the instructions from the README (i.e. running pip install --upgrade "git+https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets").

When I ran the command detect-secrets scan --update .secrets.baseline all the results which previously existed in the .secrets.baseline file were wiped away and the results was just an empty object.

When I switched to a different environment that already had the detect-secrets CLI tool installed with version 0.13.1+ibm.61.dss and I ran the same command, it worked as expected.

To Reproduce Steps to reproduce the behavior:

  1. Have an established baseline file which was created using detect-secrets@0.13.1+ibm.61.dss
  2. Install detect-secrets@0.13.1+ibm.62.dss
  3. Run detect-secrets scan --update .secrets.baseline

Expected behavior Running the aforementioned command should not result in an empty object for the results key in the baseline file

Impact

Medium

Consistent behavior when using the detect-secrets CLI tool

Additional context:

bigpick commented 4 months ago

Hrm, doing a real quick testing against the .secrets.baseline in this repo, I am unable to reproduce:

# from a fresh venv
pip install git+https://github.com/ibm/detect-secrets.git@0.13.1+ibm.61.dss

detect-secrets --version
0.13.1+ibm.61.dss

detect-secrets scan --update .secrets.baseline --use-all-plugins .

jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
19
# so 0.13.1+ibm.61.dss sees 19 potential

then

pip install --upgrade "git+https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets"

detect-secrets --version
0.13.1+ibm.62.dss

detect-secrets scan --update .secrets.baseline

jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
19
# still sees 19

(both tests are being run from macOS Sonoma 14.4.1)

bigpick commented 4 months ago

Though, generating the 0.13.1+ibm.61.dss .secrets.baseline from macOS,

pip install git+https://github.com/ibm/detect-secrets.git@0.13.1+ibm.61.dss

detect-secrets --version
0.13.1+ibm.61.dss

detect-secrets scan --update .secrets.baseline

jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
19

Then mounting in and trying to update to 0.13.1+ibm.62.dss in a centos7 container:

docker run --rm -it --platform linux/amd64 -v $PWD:/tmp/workdir centos:centos7
# Setup python stuff in container
yum update -y
yum install -y epel-release
yum groupinstall -y "Development Tools"
yum install -y openssl-devel bzip2-devel libffi-devel jq
yum install -y wget
wget https://www.python.org/ftp/python/3.9.19/Python-3.9.19.tgz 
tar xzf Python-3.9.19.tgz 
cd Python-3.9.19  
./configure --enable-optimizations 
make install

# 
# 
# actually do stuff now
cd /tmp/workdir/

# Check that still see the 19 from 0.61.0 from macOS:
jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
19

# install latest detect-secrets, this time in centos7
python3 -m pip install --upgrade "git+https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets"

detect-secrets --version
0.13.1+ibm.62.dss

detect-secrets scan --update .secrets.baseline

jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
0

I know am able to reproduce; cat .secrets.baseline resulting in it being empty:

{
  "exclude": {
    "files": "test_data/.*|tests/.*|^.secrets.baseline$",
    "lines": null
  },
  "generated_at": "2024-05-06T12:55:14Z",
  "plugins_used": [
    {
      "name": "AWSKeyDetector"
    },
    {
      "name": "ArtifactoryDetector"
    },
    {
      "name": "AzureStorageKeyDetector"
    },
    {
      "base64_limit": 4.5,
      "name": "Base64HighEntropyString"
    },
    {
      "name": "BasicAuthDetector"
    },
    {
      "name": "BoxDetector"
    },
    {
      "name": "CloudantDetector"
    },
    {
      "ghe_instance": "github.ibm.com",
      "name": "GheDetector"
    },
    {
      "name": "GitHubTokenDetector"
    },
    {
      "hex_limit": 3,
      "name": "HexHighEntropyString"
    },
    {
      "name": "IbmCloudIamDetector"
    },
    {
      "name": "IbmCosHmacDetector"
    },
    {
      "name": "JwtTokenDetector"
    },
    {
      "keyword_exclude": null,
      "name": "KeywordDetector"
    },
    {
      "name": "MailchimpDetector"
    },
    {
      "name": "NpmDetector"
    },
    {
      "name": "PrivateKeyDetector"
    },
    {
      "name": "SlackDetector"
    },
    {
      "name": "SoftlayerDetector"
    },
    {
      "name": "SquareOAuthDetector"
    },
    {
      "name": "StripeDetector"
    },
    {
      "name": "TwilioKeyDetector"
    }
  ],
  "results": {},
  "version": "0.13.1+ibm.62.dss",
  "word_list": {
    "file": null,
    "hash": null
  }
}
bigpick commented 4 months ago

FWIW - this doesn't look to be regression, just a straight bug - it is still present even when using 0.13.1+ibm.61.dss across both; in the centos container, after reverting the baseline file back to its original as generated by the macOS version:

python3 -m pip install git+https://github.com/ibm/detect-secrets.git@0.13.1+ibm.61.dss

detect-secrets scan --update .secrets.baseline

jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
0
bigpick commented 4 months ago

It looks like in the centOS version, its not actually trying to scan any files, and thats why it comes back with the .secrets.baseline file being empty?

If you add a --verbose option to the detect-secrets command, you can see that

detect-secrets --verbose scan --update .secrets.baseline
Checking file: .coveragerc
Checking file: .dockerignore
Checking file: .editorconfig
...

on macOS outputs a bunch of filepaths relative to . as expected, but in the centOS container, that same command outputs no files searched (so its like its not trying to scan anything)

... and as a sanity, manually specifying all the paths in the centOS version results in the secret baseline file mantaining its contents properly:

detect-secrets --verbose scan --update .secrets.baseline ** **/** **/**/**

jq -r '.results[] | .[] | .hashed_secret' .secrets.baseline | wc -l
19

I'll try to take a look at the path inclusion logic today if I have some spare time.

bigpick commented 4 months ago

If not specified (iow detect-secrets --verbose scan --update .secrets.baseline) it looks like args.path defaults to ..

When sent to _get_git_tracked_files it looks like the centOS7 version fails bc the options its passing to the git binary does not exist:

# on mac
git -C . ls-files
... a bunch of git tracked files for the dir ...

git --version
git version 2.45.0
# on centOS
git -C . ls-files 
Unknown option: -C
usage: git [--version] [--help] [-c name=value]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]

git --version
git version 1.8.3.1

So, the problem is that the version of git on the system/in centOS7 sources is too old and doesn't support the -C flag. Using a newer git than the ones in the default centOS repos:

yum install epel-release
yum remove git
rpm -U https://repo.ius.io/ius-release-el7.rpm
yum install git236
git --version
git version 2.36.6

The detect-secrets scan command functions identically now.