Open domanchi opened 5 years ago
I love the idea. Be able to more deterministically identify the type of the token can also support #153
A couple of notes from this paper worth mentioning (for posterity):
Section III, Part E: talks about some interesting ideas on how to better filter out junk keys (e.g. XXXX
, has EXAMPLE
in the text)
Section V, Part D: notes that multi-factor secrets (e.g. username and password) has an 80% chance that they both can often be found within 5 lines of context, before and after a secret.
Section VII, Part D: entropy checks still catch more than just regex rules. This is good to know, and allows users to decide how conservative they want to be (accuracy v recall trade-off).
I thought this part was another cool thing to experiment with:
Section III, Part D:
Note that each regular expression was prefixed with negative lookbehind
(?<![\w])
and suffixed with negative lookahead(?![\w])
to ensure that no word characters appeared before or after the regular expression match and improve accuracy.
There was a recent white paper released (summary, source).
What's most interesting is on page 15, they list a variety of explicit regexes that we may be able to incorporate into our scanning. I think we already cover like 80% (mostly with the high entropy scanner), but there are some interesting ones to extract from that. e.g.:
We should go through this list and create new plugins for the ones that we're missing.