Open mrubino-godaddy opened 3 years ago
I would guess something like this:
https://user:password@github.com/godaddy # hit
https://u$er:password@github.com/godaddy # hit
https://user:pa$$word@github.com/godaddy # hit
https://u$er:password@github.com/godaddy # hit
https://${user}:password@github.com/godaddy # hit
https://user:${password}@github.com/godaddy # hit
https://${user}:${password}@github.com/godaddy # safe
https://token:x-oauth-basic@github.com/godaddy # hit
https://to$ken:x-oauth-basic@github.com/godaddy # hit
https://${token}:x-oauth-basic@github.com/godaddy # safe
FWIW I tried fiddling and came to something like this for a partial (though still imperfect) improvement.
(?!\$\{\w+?\})[^\/\s@]{3,20}
Hi @mrubino-godaddy, thanks for reporting this! Is there any chance you would be able to take a stab at a PR for an improved regex for this purpose? The regexes are now all contained within the tartufo
codebase, found here: https://github.com/godaddy/tartufo/blob/main/tartufo/data/default_regexes.json#L29
The regex might need to be aligned to the standard definition of an URL, see RFC3986 ยง3.2 It says, summarizing:
authority = [ userinfo "@" ] host [ ":" port ]
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
where it was previously defined that:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
so from all of that, in theory, https://${user}:password@github.com/godaddy
is not a valid URL because {
and }
should be really encoded as %7B
and %7D
. $
is allowed though as is (because in sub-delims
).
Even better: don't use regexps to parse URLs (it is tricky). Use a library that will do it properly. From it, it will extract the components, including the userinfo
and then you can find that piece of data to some code that will find out if there is a secret inside it or not.
For example, as heuristics, take everything looking like http://
or https://
until like the first space (but handling cases where the URL is itself contained in quotes, or in the recommended forms of <https://...>
or <URL:https://...>
) and feeding that to an URL parsing library before extracting the userinfo
part, or other parts too as a secret could be as well in the path, like https://www.example.com?username=bob&password=eSool9veish5aiT3o
๐ Bug Report
The regex for the
Password in URL
produces false positives when attempting to externalize the password.To Reproduce
Have a file with the given text: (in this case a docker file or github actions file. The password is externalized into a variable and then injected into the url.
or
Expected Behavior
The tartufo scan should pass and allow for externalize/parameterized URL auth.
Environment