godaddy / tartufo

Searches through git repositories for high entropy strings and secrets, digging deep into commit history
https://tartufo.readthedocs.io/
GNU General Public License v2.0
459 stars 72 forks source link

False Positive for 'Password in URL' regex #216

Open mrubino-godaddy opened 3 years ago

mrubino-godaddy commented 3 years ago

๐Ÿ› Bug Report

The regex for the Password in URL produces false positives when attempting to externalize the password.

To Reproduce

Have a file with the given text: (in this case a docker file or github actions file. The password is externalized into a variable and then injected into the url.

+git config --global \
+    url."https://${GITHUB_USER}:${GITHUB_TOKEN}@github.com/godaddy".insteadOf \
+    "https://github.com/godaddy"

or

git config --global \
+    url."https://${GHEC_TOKEN}:x-oauth-basic@github.com/godaddy".insteadOf \
+    "https://github.com/godaddy"
Reason: Regular Expression Match
Detail: Password in URL

Expected Behavior

The tartufo scan should pass and allow for externalize/parameterized URL auth.

Environment

$ tartufo --version
tartufo, version 2.7.1
mrubino-godaddy commented 3 years ago

I would guess something like this:

https://user:password@github.com/godaddy          # hit
https://u$er:password@github.com/godaddy          # hit
https://user:pa$$word@github.com/godaddy          # hit
https://u$er:password@github.com/godaddy          # hit
https://${user}:password@github.com/godaddy       # hit
https://user:${password}@github.com/godaddy       # hit
https://${user}:${password}@github.com/godaddy    # safe

https://token:x-oauth-basic@github.com/godaddy    # hit
https://to$ken:x-oauth-basic@github.com/godaddy   # hit
https://${token}:x-oauth-basic@github.com/godaddy # safe
mrubino-godaddy commented 3 years ago

FWIW I tried fiddling and came to something like this for a partial (though still imperfect) improvement.

(?!\$\{\w+?\})[^\/\s@]{3,20}
tarkatronic commented 3 years ago

Hi @mrubino-godaddy, thanks for reporting this! Is there any chance you would be able to take a stab at a PR for an improved regex for this purpose? The regexes are now all contained within the tartufo codebase, found here: https://github.com/godaddy/tartufo/blob/main/tartufo/data/default_regexes.json#L29

pmevzek-godaddy commented 3 years ago

The regex might need to be aligned to the standard definition of an URL, see RFC3986 ยง3.2 It says, summarizing:

authority   = [ userinfo "@" ] host [ ":" port ]
userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

where it was previously defined that:

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

so from all of that, in theory, https://${user}:password@github.com/godaddy is not a valid URL because { and } should be really encoded as %7B and %7D. $ is allowed though as is (because in sub-delims).

Even better: don't use regexps to parse URLs (it is tricky). Use a library that will do it properly. From it, it will extract the components, including the userinfo and then you can find that piece of data to some code that will find out if there is a secret inside it or not. For example, as heuristics, take everything looking like http:// or https:// until like the first space (but handling cases where the URL is itself contained in quotes, or in the recommended forms of <https://...> or <URL:https://...>) and feeding that to an URL parsing library before extracting the userinfo part, or other parts too as a secret could be as well in the path, like https://www.example.com?username=bob&password=eSool9veish5aiT3o