[Bug]: False positive alerts in AbsolutePathDetectorExperimental plugin

riverma commented 10 months ago

Checked for duplicates

Yes - I've already checked

Website or Best Practice Guide?

Website

Describe the bug

When I ran the AbsolutePathDetectorExperimental plugin in the detect-secrets tool, I noticed it flagged lines of code comments as potential secrets, such as:

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
// fields returned by Elasticsearch (less fields = faster UI)

These are false positives as they are comments and not absolute paths.

What did you expect?

I expected the plugin to differentiate between actual code and code comments, and not flag comments as potential secrets.

Reproducible steps

1. Install and configure the detect-secrets tool.
2. Run the AbsolutePathDetectorExperimental plugin on a codebase containing the aforementioned comment lines.
3. Observe that these comments are flagged as potential secrets.

Environment

- Version of detect-secrets tool: `https://github.com/NASA-AMMOS/slim-detect-secrets.git@exp`
- Operating System: [e.g. MacOSX]

riverma commented 10 months ago

FYI @perryzjc.

perryzjc commented 10 months ago

FYI @perryzjc.

@riverma Thank you for bringing this to my attention. I will investigate the issue and work on addressing it.

riverma commented 10 months ago

FYI @perryzjc.

@riverma Thank you for bringing this to my attention. I will investigate the issue and work on addressing it.

Thanks for looking into this @perryzjc! Proposing in #100 to deal with the issue in the interim for users.

perryzjc commented 8 months ago

I find it interesting that *~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~* is a valid format for a filename or folder on a Mac. Due to this, I am considering reducing the generalization of the regex patterns used in the plugin to require that alphanumeric characters be present within each section of the path. It may not be able to catch every single possible absolute file path, such as this one, but it appears to be more usable, reducing false positives, and still able to catch most common absolute file path issues.

riverma commented 8 months ago

I find it interesting that *~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~* is a valid format for a filename or folder on a Mac.

Hi @perryzjc - the line *~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~* was not the name of a file, but part of content within a particular file that the AbsolutePathDetectorExperimental plugin scanned. It flagged that line as a file-path even though it was not in fact a file path.

Hope that clarifies this ticket!

perryzjc commented 8 months ago

*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~* is a valid format for a filename or folder on a Mac.

Hi @riverma, my point was that since it's possible to use this format for a filename, it could also be part of a path in some extreme edge cases. For example: Example This is why I mentioned that I might need to reduce some of the generalization ability. What do you think?

perryzjc commented 8 months ago

*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~* is a valid format for a filename or folder on a Mac.

Hi @riverma, my point was that since it's possible to use this format for a filename, it could also be part of a path in some extreme edge cases. For example: This is why I mentioned that I might need to reduce some of the generalization ability. What do you think?

Hi @riverma, just in case you missed the above comment. Could you please share your opinion on this matter? It would be immensely helpful for making decisions on how to address the bug. Thank you!

riverma commented 8 months ago

*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG STASH ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~* is a valid format for a filename or folder on a Mac.

Hi @riverma, my point was that since it's possible to use this format for a filename, it could also be part of a path in some extreme edge cases. For example: This is why I mentioned that I might need to reduce some of the generalization ability. What do you think?

Hi @riverma, just in case you missed the above comment. Could you please share your opinion on this matter? It would be immensely helpful for making decisions on how to address the bug. Thank you!

Hi @perryzjc - thanks for the reminder. I suggest that all flagged paths in your plugin be tested with a second method to ensure we're reducing false positives. For example, perhaps you can leverage os.path.isabs(path) to test each flagged path to ensure it is both a valid path and it is an absolute path regardless if it actually exists on disk or not.

riverma commented 8 months ago

If the second layer still flags the strings as valid file paths, then I suggest we either:

Blacklist common comment blocks in languages like /*...*/, // ..., while ensuring existing tests pass
Just ignore and leave this as a false positive (better to catch as a false positive than miss as a false negative!)

What do you think?

perryzjc commented 8 months ago

If the second layer still flags the strings as valid file paths, then I suggest we either:

Blacklist common comment blocks in languages like /*...*/, // ..., while ensuring existing tests pass

Just ignore and leave this as a false positive (better to catch as a false positive than miss as a false negative!)

What do you think?

@riverma, thank you for your suggestions!

I previously tested os.path.isabs(path) while searching for official functions in Python to handle file paths. However, it has its limitations. For instance, the documentation comments on the function highlight some of its shortcomings: Additionally, my test cases have demonstrated that the function is heavily system-dependent. On a Mac, for example, it mistakenly identifies a Windows absolute path as not a file path: As showcased in the screenshot above, for the first test case,os.path.isabs(path) also classifies comment blocks as absolute paths, since a Mac indeed able to have such file path.
Given these observations, I find os.path.isabs(path) to be somewhat unreliable, leading to false negatives (as seen in the Windows File Path test case). In light of this, I am inclined to follow your second suggestion of blocklisting common comment blocks.
Emphasizing the importance of capturing false positives over missing false negatives is a valuable takeaway, and I plan to keep this in mind as I proceed.

Once again, thank you for your advice. I am currently updating the implementation based on these insights and am open to any additional suggestions you may have.

riverma commented 8 months ago

Great to hear @perryzjc - and thanks for your ongoing contributions! Look forward to seeing your thoughts in the code / guide. Agreed - blacklisting may be the best path forward 👍 .

NASA-AMMOS / slim