gitleaks / gitleaks

Protect and discover secrets using Gitleaks 🔑
https://gitleaks.io
MIT License
18.06k stars 1.48k forks source link

Paths and Fingerprints are platform specific and not portable #1565

Open Okeanos opened 1 month ago

Okeanos commented 1 month ago

Describe the bug

Gitleaks outputs platform specific paths and Fingerprint identifiers leading to reproducibility issues as well rule / ignore duplication.

To Reproduce

Create a .gitleaksignore with the following content:

foo/bar/gitleaks-false-positive.yaml:aws-access-token:4

and a .gitleaks.toml with the following content:

title = "Gitleaks Sandbox Config"

[extend]
  useDefault = true

[allowlist]
  description = "Allowing dummy values in foo/allowlist to test things"
  paths = ['''(^|/)foo/allowlist/.*?$''']

Run gitleaks (Linux and/or macOS) on a project that has the following files with the following contents:

aws_token: "AKIALALEMEL33243OLIA"

Using the invocation: gitleaks dir --verbose --redact . everything is fine:

    ○
    │╲
    │ ○
    ○ ░
    ░    gitleaks

4:36PM INF scan completed in 4.74ms
4:36PM INF no leaks found

Run the same check on Windows (PowerShell or Git Bash doesn't matter):

    ○
    │╲
    │ ○
    ○ ░
    ░    gitleaks

Finding:     aws_token: "REDACTED
Secret:      REDACTED
RuleID:      aws-access-token
Entropy:     3.084184
File:        foo\allowlist\gitleaks-false-positive.yaml
Line:        4
Fingerprint: foo\allowlist\gitleaks-false-positive.yaml:aws-access-token:4

Finding:     aws_token: "REDACTED
Secret:      REDACTED
RuleID:      aws-access-token
Entropy:     3.084184
File:        foo\bar\gitleaks-false-positive.yaml
Line:        4
Fingerprint: foo\bar\gitleaks-false-positive.yaml:aws-access-token:4

Results in unexpected findings that should have been ignored according to config and ignore file.

Changing the .gitleaksignore to the following

foo/bar/gitleaks-false-positive.yaml:aws-access-token:4
foo\bar\gitleaks-false-positive.yaml:aws-access-token:4

gets rid of one of the findings … as expected I guess? I assume the same is true for the config.

Expected behavior

A config/ignore file written for Unix works on Windows and vice versa.

I would expect to have a canonical (Unix?) style path declaration in the Gitleaks config and ignore file that then Gitleaks internally and silently translates into the host system's preferred style for comparison. I would also only expect a single canonical Fingerprint format without platform / OS specific directory separator tokens.

I do not expect to have to duplicate all rules / ignore statements to accommodate Windows and Unix.

From my perspective as a consumer this is unexpected behaviour and a defect that prevents/defeats cross-platform collaboration.

Screenshots

I created a GitHub Workflow to reproduce the issue, its results are available here:

Basic Info (please complete the following information):

Additional context

To my surprise I couldn't find any related issue to this using any of the keywords I could think of:

Am I really the first to notice this or want this behaviour fixed? Please someone tell me I am just holding this wrong and there's in fact no problem.

cc @zricethezav

rgmz commented 1 month ago

IMO the current fingerprint has a lot of sharp edge cases.

@Okeanos do you find commit/path/rule specific entries to be helpful, or would you prefer to be able to whitelist a secret outright? I created #1566 as an experiment to simplify the approach.

Okeanos commented 1 month ago

I am personally fine with the [commit:]path:rule:line format of the .gitleaksignore file; gitleaks ignores #-prefixed by default so any further meta-information I want to add, I cann (say a reason why I put it there).

Additionally, the format itself is human readable. A human can thus reason about it and possibly deduce why an entry was added without additional tools besides the git history if no other context clues were added.

The similarity to a .gitignore in this regard is welcome.

Hashing the secret and using that instead of a fingerprint sounds interesting. I would have think about this a little. Some initial thoughts:


Please note that the path problem also exists in the .gitleaks.toml config file.

rgmz commented 1 month ago

This seems straightforward to fix, although doing so in a non-breaking way could be a little ugly.

I created a backwards-compatible POC here (fingerprint only). + @zricethezav


could be a little ugly.

Edit: to elaborate, the fingerprint can easily swap \ for /. It isn't safe to do this for path because it's a regular expression.

Edit2: We'd also need to check compatibility for baseline.

Okeanos commented 1 month ago

Thanks for working on this so quickly 😍.

Concerning paths in e.g. allowlist config settings: wouldn't it work if the internal representation of a path within Gitleaks is always normalized to Unix path style? (Breaking change for existing regexes aside which would need to be addressed)

That way the regex wouldn't need to be touched or "translated" but stays as the users defined it. It would still match regardless of platform. However, I can understand that there may be a benefit in having the ability to target Windows and Unix style paths specifically/individually.