Open riverma opened 10 months ago
FYI @perryzjc.
FYI @perryzjc.
@riverma Thank you for bringing this to my attention. I will investigate the issue and work on addressing it.
FYI @perryzjc.
@riverma Thank you for bringing this to my attention. I will investigate the issue and work on addressing it.
Thanks for looking into this @perryzjc! Proposing in #100 to deal with the issue in the interim for users.
I find it interesting that *~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
is a valid format for a filename or folder on a Mac.
Due to this, I am considering reducing the generalization of the regex patterns used in the plugin to require that alphanumeric characters be present within each section of the path. It may not be able to catch every single possible absolute file path, such as this one, but it appears to be more usable, reducing false positives, and still able to catch most common absolute file path issues.
I find it interesting that
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
is a valid format for a filename or folder on a Mac.
Hi @perryzjc - the line *~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
was not the name of a file, but part of content within a particular file that the AbsolutePathDetectorExperimental plugin scanned. It flagged that line as a file-path even though it was not in fact a file path.
Hope that clarifies this ticket!
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
is a valid format for a filename or folder on a Mac.
Hi @riverma, my point was that since it's possible to use this format for a filename, it could also be part of a path in some extreme edge cases. For example:
This is why I mentioned that I might need to reduce some of the generalization ability. What do you think?
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
is a valid format for a filename or folder on a Mac.Hi @riverma, my point was that since it's possible to use this format for a filename, it could also be part of a path in some extreme edge cases. For example:
This is why I mentioned that I might need to reduce some of the generalization ability. What do you think?
Hi @riverma, just in case you missed the above comment. Could you please share your opinion on this matter? It would be immensely helpful for making decisions on how to address the bug. Thank you!
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
is a valid format for a filename or folder on a Mac.Hi @riverma, my point was that since it's possible to use this format for a filename, it could also be part of a path in some extreme edge cases. For example:
This is why I mentioned that I might need to reduce some of the generalization ability. What do you think?
Hi @riverma, just in case you missed the above comment. Could you please share your opinion on this matter? It would be immensely helpful for making decisions on how to address the bug. Thank you!
Hi @perryzjc - thanks for the reminder. I suggest that all flagged paths in your plugin be tested with a second method to ensure we're reducing false positives. For example, perhaps you can leverage os.path.isabs(path)
to test each flagged path to ensure it is both a valid path and it is an absolute path regardless if it actually exists on disk or not.
If the second layer still flags the strings as valid file paths, then I suggest we either:
/*...*/
, // ...
, while ensuring existing tests passWhat do you think?
If the second layer still flags the strings as valid file paths, then I suggest we either:
- Blacklist common comment blocks in languages like
/*...*/
,// ...
, while ensuring existing tests pass- Just ignore and leave this as a false positive (better to catch as a false positive than miss as a false negative!)
What do you think?
@riverma, thank you for your suggestions!
I previously tested os.path.isabs(path)
while searching for official functions in Python to handle file paths. However, it has its limitations. For instance, the documentation comments on the function highlight some of its shortcomings:
Additionally, my test cases have demonstrated that the function is heavily system-dependent. On a Mac, for example, it mistakenly identifies a Windows absolute path as not a file path:
As showcased in the screenshot above, for the first test case,
os.path.isabs(path)
also classifies comment blocks as absolute paths, since a Mac indeed able to have such file path.
Given these observations, I find os.path.isabs(path)
to be somewhat unreliable, leading to false negatives (as seen in the Windows File Path test case). In light of this, I am inclined to follow your second suggestion of blocklisting common comment blocks.
Emphasizing the importance of capturing false positives over missing false negatives is a valuable takeaway, and I plan to keep this in mind as I proceed.
Once again, thank you for your advice. I am currently updating the implementation based on these insights and am open to any additional suggestions you may have.
Great to hear @perryzjc - and thanks for your ongoing contributions! Look forward to seeing your thoughts in the code / guide. Agreed - blacklisting may be the best path forward 👍 .
Checked for duplicates
Yes - I've already checked
Website or Best Practice Guide?
Website
Describe the bug
When I ran the AbsolutePathDetectorExperimental plugin in the detect-secrets tool, I noticed it flagged lines of code comments as potential secrets, such as:
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
// fields returned by Elasticsearch (less fields = faster UI)
These are false positives as they are comments and not absolute paths.
What did you expect?
I expected the plugin to differentiate between actual code and code comments, and not flag comments as potential secrets.
Reproducible steps
Environment