Open tsigouris007 opened 1 week ago
Hi @tsigouris007, thank you for your PR!
Could you explain what the bug was?
Hi, You can visit the issue for the compete write-up.
Hi @tsigouris007, #857 describes the issue, I was looking for an explanation around what's causing said issue in the code
Hi @tsigouris007, #857 describes the issue, I was looking for an explanation around what's causing said issue in the code
I haven't quite had the time to find the exact location that brakes this.
The combined opening/closing square brackets []
seem to be causing the lines to not be fetched at all for checking.
For example I got AWSKeyDetector
's code into a separate standalone python file and the following secret was caught properly by the regex:
access_key_id: "<%= ENV['CLIENT_ACCESS_KEY_ID'].presence || 'AKIA123456789ABCDEF1' %>"
I am taking wild a guess here and I most probably think the problem is how the checked file lines are fetched (and that's why I thought of "escaping" the square brackets earlier).
As far as I can help atm you can go to detect_secrets/core/scan.py
method _process_line_based_plugins
and the following condition is returning True
and is quitting early:
if _is_filtered_out(
required_filter_parameters=['line'],
filename=filename,
line=line,
context=code_snippet,
):
If you remove it or add a pass
instead of continue
these secrets are caught because it hits the else
clause in _is_filtered_out
loop returning a debug message of:
Skipping secret due to `detect_secrets.filters.heuristic.is_indirect_reference`.
I can take a closer look but it will take me at least a couple of days.
Understanding what exactly is causing the bug will help us finding a good solution moving forward rather than a workaround that could potentially add another set of issues, hence why I'm asking this. Take your time and please let me know what you'll find!
Understanding what exactly is causing the bug will help us finding a good solution moving forward rather than a workaround that could potentially add another set of issues, hence why I'm asking this. Take your time and please let me know what you'll find!
It's understandable. I 'll try to get back to this asap.
Hi @lorenzodb1 ,
so I found where the problem occurs.
In file detect_secrets/filters/heuristic.py
there is _get_indirect_reference_regex
with a regex of:
return re.compile(r'([^\v=!:]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])')
So the choice is to either change this regex a bit or escape earlier the square brackets.
The latter could be done as is in my PR or we could do it also in is_indirect_reference
function inside the heuristic.py
file right before the boolean return as shown:
def is_indirect_reference(line: str) -> bool:
"""
Filters secrets that take the form of:
secret = get_secret_key()
or
secret = request.headers['apikey']
"""
# Constrain line length as the heuristic's intention is to target lines that resemble
# function calls. The constraint avoids catastrophic backtracking failures of the regex.
if len(line) > 1000:
return False
line = line.replace('[', '\[').replace(']', '\]')
return bool(_get_indirect_reference_regex().search(line))
This produces the correct results.
After playing around with the regex I came up with the following that seems to be working (tested on a huge repo with ~1.2k secrets):
def _get_indirect_reference_regex() -> Pattern:
# Regex details:
# ([^\v=!:]*) -> Something before the assignment or comparison
# \s* -> Some optional whitespaces
# (:=?|[!=]{1,3}) -> Assignment or comparison: :=, =, ==, ===, !=, !==
# \s* -> Some optional whitespaces
# (
# [\w.-]+ -> Some alphanumeric character, dot or -
# [\[\(] -> Start of indirect reference: [ or (
# [^\v]* -> Something except line breaks
# [\]\)] -> End of indirect reference: ] or )
# )
# return re.compile(r'([^\v=!:]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])') # Left the old one for reference
return re.compile(r'([^\v=!:"<%>]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])')
How do you want to go through with this? I have no strong opinion.
Please check if the PR fulfills these requirements
[x] Tests for the changes have been added
[ ] Docs have been added / updated
[ ] All CI checks are green
What kind of change does this PR introduce?
Fixes behavior on missed secrets that have square brackets before their default value. Example:
Issue: https://github.com/Yelp/detect-secrets/issues/857
Escapes the square brackets and catches secrets properly.
No breaking changes.