Using Template for Secrets Detection Guide

riverma commented 4 months ago

Purpose

Simplified guide significantly based on user feedback
Utilizing new SLIM best practice guide template format
Proposed Changes
[CHANGE] README contents
[FIX] GitHub Action workflow was displaying code output instead of executing code during scan
Issues
101

Testing
Locally tested with sample repository: full scan, pre-commit hook
GitHub.com Action not quite working with test-repo. Considering to marking feature as experimental.
See rendered guide here: https://github.com/NASA-AMMOS/slim/tree/detect-secrets-rewrite/docs/guides/software-lifecycle/security/secrets-detection

riverma commented 3 months ago

@NASA-AMMOS/slim-committers - any thoughts about this PR? Please review if you can so we can get this merged. CC @perryzjc.

perryzjc commented 3 months ago

@NASA-AMMOS/slim-committers - any thoughts about this PR? Please review if you can so we can get this merged. CC @perryzjc.

@riverma: I will do it within the next two days. I just noticed this; I had been busy with school for a while. Thank you for the reminder.

riverma commented 3 months ago

@NASA-AMMOS/slim-committers - any thoughts about this PR? Please review if you can so we can get this merged. CC @perryzjc.

@riverma: I will do it within the next two days. I just noticed this; I had been busy with school for a while. Thank you for the reminder.

No prob @perryzjc - thank you!

riverma commented 3 months ago

Hi Folks! @NASA-AMMOS/slim-committers and @perryzjc. I'd like to demo this reformatted guide tomorrow, so if you have any final comments please share. I plan to merge March 19th. Otherwise, just remember, you can always propose a PR to alter this guide if you have an issue with it!

riverma commented 3 months ago

Thanks for reviewing @ingyhere.

perryzjc commented 3 months ago

Hi @riverma,

I just had the chance to review the PR and I have some comments on it. Overall, it looks good, but I noticed a few minor things.

I observed the updates made by @ingyhere, which added more excluded files to potentially help reduce false positives. This initially seems helpful, but the same optimization could be applied to other parts of the document that also involve --exclude-files, such as one code at Quick Start, another code at Layer 1, and another code at Layer 2.
However, including so many lines makes the command inconvenient to use and introduces repetitiveness for the document and maintenance difficulties. A potential solution to this could be to refactor the command about exclude files, making it read the content from .gitignore. Another potential way is to define a YAML config file to be used for the three layers along with more comprehensive excluded file list. This would not only keep the command short but also make it easier to modify to exclude more files and be more compatible with SDLC. What do you think? I am not sure about the feasibility, but I can do some research and experimentation on it.

Finally, a few notes for reference:

Regarding History Scanning, the current detect-secret doesn't support this, but someone had tried to make a PR about this. The PR hasn't been finalized for 4 years, so it might still take a long time to see this feature. However, trufflehog is really powerful for scanning different kinds of resources for secrets, such as a history of commits, secrets from a GitHub org, an AWS S3, and more.
A few imperfections about trufflehog include:
- It doesn't use a secret management baseline approach, which is not very convenient for local and GitHub side automated scans, especially when handling false positives. That's where detect-secret is really helpful.
- When using trufflehog, its plugin system is partially different from that of detect-secret. So, some secrets detected by detect-secret might not be detected by trufflehog, and vice versa.
- Although not perfect, trufflehog still seems worth mentioning, as it brings value for large-scale scans over different resources, which seems valuable for the JPL community.
- Other potential solutions to this might be making PRs/customized version to migrate regex of plugins at detect-secret to trufflehog, such as IP address detection.
I love the section about "How are commits containing secrets removed permanently from Git history?". It's really concise and introduces two ways of removing history. My question is, do you think it's worth having another document to elaborate more on this part (a potential future work)? A few references about this are some of my previous experimental uses of git filter-branch.
For our customized plugins, some of them look good for Yelp/detect-secret reviewers and are prepared to be merged, such as IPV4 Detection. Some of others, are being suggested for some updates for better use, such as the Email Address Detection. So there are also some potential work to do for me.

I hope this comment is helpful. It outlines some potential work that could take time to complete. I can assist with it based on needs and priorities. My school is about to have Spring Break, so hopefully, I can get some relief from the busy coursework for a while...

ingyhere commented 3 months ago

I observed the updates made by @ingyhere, which added more excluded files to potentially help reduce false positives. This initially seems helpful, but the same optimization could be applied to other parts of the document that also involve --exclude-files, such as one code at Quick Start, another code at Layer 1, and another code at Layer 2.

Good catch. Those should be uniform across the invocations (local, actions, etc.). There appears to be a minor error on line 212 (two dots), but I have to look up whether it is using glob or regex. ...

However, including so many lines makes the command inconvenient to use and introduces repetitiveness for the document and maintenance difficulties. A potential solution to this could be to refactor the command about exclude files, making it read the content from .gitignore. Another potential way is to define a YAML config file to be used for the three layers along with more comprehensive excluded file list. This would not only keep the command short but also make it easier to modify to exclude more files and be more compatible with SDLC. What do you think? I am not sure about the feasibility, but I can do some research and experimentation on it.

The idea of using the ignore file is great but it may also contain CI files, or something similar, that shouldn't be routinely overwritten, but also legitimately contain errant secrets. I don't think I would use the .gitignore for that reason.

OTOH, that configuration perhaps could be added to the .secrets.baseline probably without consequence. Adding an extra YAML is an option, but then there's yet another new configuration file to track, and it would be just as well for users to keep the number of configs limited. Or, just keep it as-is -- It always should be user modifiable, maybe mention that.

I love the section about "How are commits containing secrets removed permanently from Git history?". It's really concise and introduces two ways of removing history. My question is, do you think it's worth having another document to elaborate more on this part (a potential future work)?

Thanks. The history removal requires a lot of specialized expertise and is frankly dangerous. I don't think a guide can cover all edge cases which would make it riskier to apply. This is a worrisome topic.

riverma commented 3 months ago

Thanks @perryzjc for the comment and @ingyhere for the thoughts!

Hi @riverma,

I just had the chance to review the PR and I have some comments on it. Overall, it looks good, but I noticed a few minor things.

I observed the updates made by @ingyhere, which added more excluded files to potentially help reduce false positives. This initially seems helpful, but the same optimization could be applied to other parts of the document that also involve --exclude-files, such as one code at Quick Start, another code at Layer 1, and another code at Layer 2.

However, including so many lines makes the command inconvenient to use and introduces repetitiveness for the document and maintenance difficulties. A potential solution to this could be to refactor the command about exclude files, making it read the content from .gitignore. Another potential way is to define a YAML config file to be used for the three layers along with more comprehensive excluded file list. This would not only keep the command short but also make it easier to modify to exclude more files and be more compatible with SDLC. What do you think? I am not sure about the feasibility, but I can do some research and experimentation on it.

I agree with @ingyhere - reading from the .gitignore could introduce unexpected behavior. It's always best to minimize reading from files that are intended for specific tools. Also - a separate config file may be a bit annoying to maintain. I'd imagine that a user might even think: why don't I just copy my long command into a script to automate it?

Personally - I feel the long --exclude clauses give transparency and would likely just be a copy / paste operation for a user whenever they run the tool. It seems though that the current list is Pythonic, and may need further modification for other languages.

Would it be more straightforward to just have a sub-section in the guide for recommended exclusions, and maintain that list over time? A copy / paste could do the trick for a reader.

Finally, a few notes for reference:

Regarding History Scanning, the current detect-secret doesn't support this, but someone had tried to make a PR about this. The PR hasn't been finalized for 4 years, so it might still take a long time to see this feature. However, trufflehog is really powerful for scanning different kinds of resources for secrets, such as a history of commits, secrets from a GitHub org, an AWS S3, and more.

A few imperfections about trufflehog include:

It doesn't use a secret management baseline approach, which is not very convenient for local and GitHub side automated scans, especially when handling false positives. That's where detect-secret is really helpful.

When using trufflehog, its plugin system is partially different from that of detect-secret. So, some secrets detected by detect-secret might not be detected by trufflehog, and vice versa.

Although not perfect, trufflehog still seems worth mentioning, as it brings value for large-scale scans over different resources, which seems valuable for the JPL community.

Other potential solutions to this might be making PRs/customized version to migrate regex of plugins at detect-secret to trufflehog, such as IP address detection.

If you've experimented with Trufflehog and feel confident suggesting it, maybe we could mention it as an answer in the FAQ question about this as a start? I think the detect-secrets tool approach you've recommended is pretty great for now and unless things really evolve on the trufflehog side to support secrets management, we could continue to recommend detect-secrets. I like that you're thinking of other tools though - the guide should evolve as the tool-landscape changes!

I love the section about "How are commits containing secrets removed permanently from Git history?". It's really concise and introduces two ways of removing history. My question is, do you think it's worth having another document to elaborate more on this part (a potential future work)? A few references about this are some of my previous experimental uses of git filter-branch.

Rather than us writing another document, perhaps we can link to a document on the web that outlines the steps clearly? I have to imagine this is a common topic!

For our customized plugins, some of them look good for Yelp/detect-secret reviewers and are prepared to be merged, such as IPV4 Detection. Some of others, are being suggested for some updates for better use, such as the Email Address Detection. So there are also some potential work to do for me.

💯

I hope this comment is helpful. It outlines some potential work that could take time to complete. I can assist with it based on needs and priorities. My school is about to have Spring Break, so hopefully, I can get some relief from the busy coursework for a while...

Of course! We're just grateful to have your continued interest in the project @perryzjc and hope its useful in your other efforts! Best of luck with all the school.

ingyhere commented 3 months ago

There's another way to see inside all the commits that uses Git log, something like git log --all --raw --patch --no-decorate to start. Identifying a precise location where a secret is located would be scripting around it, but a smart script based on that would at least let the user know if there are secrets at all.

NASA-AMMOS / slim

Using Template for Secrets Detection Guide #143

Purpose

Proposed Changes

Issues

101

Testing