NASA-AMMOS / slim

Software Lifecycle Improvement & Modernization
https://nasa-ammos.github.io/slim/
Apache License 2.0
24 stars 9 forks source link

[Improve Existing Best Practice Guide]: Automated checking for general sensitive information within Git #89

Closed riverma closed 1 year ago

riverma commented 1 year ago

Checked for duplicates

Yes - I've already checked

Best Practice Guide

Continuous Integration

Best Practice Guide Sections

Starter Kits

Describe the improvement

We have some existing recommendations for checking sensitive AWS credential information via using git-secrets described here. However, we've received feedback that this could be improved via the following:

To support these two needs, we should evaluate if git-secrets is the right tool, or if it should be augmented or replaced with a better solution.

jordanpadams commented 1 year ago

One other idea:

Per this:

A GitHub-side automation that checks repositories even if folks have committed and pushed sensitive information

The guidelines should also include a link to documentation about how to deep clean this from your commit history. Additionally, this automated check should also include GitHub Issues, which can include sensitive information.

riverma commented 1 year ago

Great suggestions @jordanpadams - we will plan to include these in our scope.

riverma commented 1 year ago

FYI @perryzjc see folks who are also interested in this scope of work here.

sneely333 commented 1 year ago

We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.

riverma commented 1 year ago

We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.

That's a very comprehensive list of tips - thank you very much @sneely333. We will look these over.

perryzjc commented 1 year ago

I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.

The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.

The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.

Tool Name File Content File Name Commit Message Pre-commit-hook Check history GitHub-side Automation Customized Pattern Entropy Analysis
git-secrets Yes No Yes Yes Yes Yes (doable) Yes No
gitleaks Yes No No Yes Yes Yes Yes Yes
trufflehog Yes No No Yes Yes Yes Yes Yes
talisman Yes Yes No Yes Yes Yes (doable) Yes Yes
riverma commented 1 year ago

@perryzjc - great work here. Will scope this and provide feedback. One tip is you'll want to also include GitHub's own secrets scanning tool in your trade-study. What is missing from GitHub's tool that these other tools support?

Example screenshot available in project settings on GitHub: Screenshot 2023-02-27 at 11 09 07 AM

riverma commented 1 year ago

I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.

  • They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.

The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.

The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.

Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well.

Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:

  1. New code commits (locally) -> code pushes (to remote) -> code CI (on GitHub.com)
  2. Full codebase scans (locally)
  3. Full codebase history, including previous commits

Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks.

perryzjc commented 1 year ago

I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.

  • They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.

The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of. The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.

Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well.

Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:

  1. New code commits (locally) -> code pushes (to remote) -> code CI (on GitHub.com)
  2. Full codebase scans (locally)
  3. Full codebase history, including previous commits

Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks.

Hi @riverma, thank you for the guidance! I will do one more trade study on GiHub’s own secret scanner and then make a architecture graph answering those questions.

perryzjc commented 1 year ago

As an comprehensive Architecture Diagram is taking longer than I expected (will post soon within next 2 days), I would like to provide an update on my trade study of GitHub Action's Secret scanning firstly.

GitHub Action's Secret scanning looks like a convenient and user-friendly product, particularly for GitHub-side automation. It offers additional features compared to other tools, but it also has its limitations, which is not friendly for public repositories.

Updated Trade Study table, compared to the old one

Tool Name File Content File Name Commit Message Pre-commit-hook Check history GitHub-side automation Customized Pattern Entropy Analysis
git-secrets Yes No Yes Yes Yes Yes (doable) Yes No
gitleaks Yes No No Yes Yes Yes Yes Yes
trufflehog Yes No No Yes Yes Yes Yes Yes
talisman Yes Yes No Yes Yes Yes (doable) Yes Yes
Secret scanning (GitHub's) Yes No No Similar as push protection Yes Yes, and more convenient as a built-in product Yes No

Here are the unique features of GitHub's Secret scanning

Pros

Cons

My thought

Because of the Cons, I feel GitHub's secret scanning is not useful for public repository, it's more like a product helping companies discover whether the tokens they provide to users have been abused

riverma commented 1 year ago

Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here!

The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work.

Look forward to your architecture / recommendation for this ticket!

perryzjc commented 1 year ago

Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here!

The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work.

Look forward to your architecture / recommendation for this ticket!

Hi @riverma - yes, I think it's doable to scan the issue tickets. Here is the relevant screenshot:

image

I have been doing a lot of research lately, consulting with other software engineers, and testing out multiple tools (in addition to the ones I mentioned previously). From what I've found so far, it seems that there isn't a single tool or combination of tools in the open source world that can fully meet all of our needs. For instance, it's challenging to find a tool that can scan file content, file names, commit information, history, issue tickets, support pre-commit-hooks, support regular expressions, and have entropy analysis capabilities all at once. This means that some customization will likely be necessary.

However, I recently stumbled upon a tool called "detect-secrets" that was recommended by Microsoft. It supports entropy analysis, has some commonly used regular expressions built-in, and scans quickly. It's also written in Python and designed to be scalable, which is a big plus.

While it doesn't currently support detecting file names and commit information, I've found that it's entirely feasible and relatively straightforward to modify the Python code and create a commit-msg hook.

The tool works well on GitHub Actions, but I did encounter one issue: the free version of GitHub doesn't support pre-receive hook. This means that although GitHub Action can detect the presence of sensitive information, the file has already been uploaded to the branch. This same issue also applies to issue tickets.

As of now, I don't have a solution for these two problems, but I think that the current approach is the most optimal solution compared to other options out there. It covers a wide range of needs and can potentially solve those two problems.

In addition, when it comes to scanning history, trufflehog is particularly powerful. If we could incorporate our customized detect-secrets tool into that type of historical scan, I believe the results would be excellent.

I'll be providing an architecture diagram shortly.

perryzjc commented 1 year ago

Here is the Scope of Work my solution able to provide:

Screen Shot 2023-03-10 at 10 06 53 PM

Note

The priority of each implementation is based on the needs from the community

Scope of Work:

riverma commented 1 year ago

Hi @perryzjc - thanks for the write up here!

My thoughts:

perryzjc commented 1 year ago

Here is my diagram about how each tool relates to each need.

graph TD
subgraph solution
  subgraph Development-Tools
    subgraph open-source-tools
      detect_secrets[Detect Secrets]
      trufflehog[Trufflehog]
      style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      style trufflehog fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
    subgraph automation-tools
      subgraph local-git-hooks
        pre-commit[pre-commit hook]
        commit-msg[commit-msg hook]
        style pre-commit fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
        style commit-msg fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      end
      subgraph remote-GitHub-Action
        pre-receive[pre-receive hook]
        workflows[workflows on push]
        webhook[webhook]
        style pre-receive fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
        style workflows fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
        style webhook fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      end
    end
    style Development-Tools fill:#5DADE2,stroke:#333,stroke-width:2px
  end

  subgraph Needs
    Detecting-diverse-types-of-secrets{{Detecting-diverse-types-of-secrets}}
    Local-commit-protection[[fa:fa-ban Local-commit-protection]]
    Alarm-detected-secrets>fa:fa-camera-retro Alarm-detected-secrets]
    Push-protection[[fa:fa-ban Push-protection]]
    Detect-secrets-in-full-code-history[(Detect-secrets-in-full-code-history)]
    Detection-of-secrets-in-different-media{{Detection-of-secrets-in-different-media}}
  end

  detect_secrets == solves ==> Detecting-diverse-types-of-secrets
  local-git-hooks == solves ==> Local-commit-protection
  remote-GitHub-Action == solves ==> Alarm-detected-secrets
  pre-receive == solves if GitHub enterprise version ==> Push-protection
  remote-GitHub-Action == solves ==> Detection-of-secrets-in-different-media
  local-git-hooks == solves ==> Detection-of-secrets-in-different-media
  trufflehog == solves ==> Detect-secrets-in-full-code-history

  subgraph Other-notes
    subgraph development
      subgraph Server-may-needed
      end
      subgraph Web-crawling-may-needed
      end
    end
    subgraph needs
      subgraph webhook's-post-request
      end
      subgraph Secrets-in-previous-issues
      end
    end
    PRI([Becasue of the scalability of detect secrets, <br> all feature proposed here are implementable. <br> But priority is based on the needs of the community.])
    style PRI fill:#AF505C,stroke:#333,stroke-width:2px
  end

  webhook's-post-request -. need to be handled by .-> Server-may-needed
  Secrets-in-previous-issues -. can be scaned by .-> Web-crawling-may-needed

  webhook -- sends --> webhook's-post-request

  style solution fill:#EC7063,stroke:#666,stroke-width:4px
  style Detecting-diverse-types-of-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
  style Local-commit-protection fill:#48C9B0,stroke:#333,stroke-width:2px
  style Push-protection fill:#48C9B0,stroke:#333,stroke-width:2px
  style Alarm-detected-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
  style Detection-of-secrets-in-different-media fill:#48C9B0,stroke:#333,stroke-width:2px
  style Detect-secrets-in-full-code-history fill:#48C9B0,stroke:#333,stroke-width:2px
  style Server-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
  style Web-crawling-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
  style webhook's-post-request fill:#58D68D,stroke:#333,stroke-width:2px
  style Secrets-in-previous-issues fill:#58D68D,stroke:#333,stroke-width:2px
end

Development Tools:

Usage

detect secrets

local - git hook

remote - GitHub Action

local - git hook (AND) remote - GitHub Action

trufflehog

Other notes

perryzjc commented 1 year ago

Hi @perryzjc - thanks for the write up here!

My thoughts:

  • Preference on the client-side scanning over GitHub if you're having trouble handling the latter. GitHub should serve as a backup layer to prevent sensitive info (more alerting than stopping), but understandably it won't have all the safety features of a git pre-hook. I think if someone writes code on GitHub itself and pushes to a branch, we could have the automation point to docs about purging the repo history. We don't want to require GitHub Enterprise features btw.
  • In terms of features to support, I think prioritize the features specifically mentioned in this ticket over others that might be nice but may get us bogged down
  • Some use cases to make things more tangible for your architecture diagram / approach: (1) client-side full scan of existing code base, (2) client-side scan of updated code upon Git commit, (3) server-side push to GitHub.com from client, or writing code on GitHub.com itself and being warned about sensitive info at earliest possible stage and pointers on how to purge / fix

Hi @riverma, thanks for the suggestions!

I'll prioritize the needs of the community for the actual implementation. The first diagram was just to show the potential features of my solution.

I've added another diagram to show how each tool relates to each need.

Next up, I'll work on the diagram you mentioned.

With these three diagrams, hope it can provide people a better understanding of what my solution can do and how to use it.

perryzjc commented 1 year ago

@riverma Here are my other diagrams of the solution. It includes the most essential parts of my solution.

Solution Structure Diagram

graph TD
subgraph SolutionStructure
  subgraph SecretsDetectionApproach
    subgraph Layer1["Layer 1: Push to GitHub.com (server-side)"]
      style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
    subgraph Layer2["Layer 2: Scan of updated code upon Git commit (client-side)"]
      style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
    subgraph Layer3["Layer 3: Full scan of the existing code base (client-side)"]
      style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
  end

  subgraph Tools
    subgraph CoreTool["Core Tool: Detect Secrets"]
      style CoreTool fill:#5DADE2,stroke:#333,stroke-width:2px
      detect_secrets{{detect-secrets}}
      style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end

    subgraph OtherTools["Other Tools"]
      pre_commit_ci[pre-commit.ci]
      github_action[GitHub Action]
      pre_commit_manager[pre-commit manager]
      style github_action fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      style pre_commit_ci fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      style pre_commit_manager fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
  end

  subgraph LayerDetails
    subgraph Layer1Details[Layer 1 Details]
      Compatible_with_all_local_machines{{Compatible with all local machines}}
      Protection_for_main_branch{{Protection for the main branch}}
      Error_notifications_via_GitHub_email{{Error notifications via GitHub and email}}
      Implemented_with_detect_secrets_workflow[[Implemented with detect-secrets workflow]]
    end
    subgraph Layer2Details[Layer 2 Details]
      Optional_due_to_compatibility_issues{{Optional due to compatibility issues}}
      Early_stage_secrets_detection{{Early-stage secrets detection}}
      Commit_prevention_and_error_messages{{Commit prevention and error messages}}
      Implemented_with_pre_commit_manager[[Implemented with pre-commit manager]]
    end
    subgraph Layer3Details[Layer 3 Details]
      Direct_use_of_detect_secrets{{Direct use of detect-secrets}}
      Error_messages_for_detected_secrets{{Error messages for detected secrets}}
    end
  end

  Layer1 -->|uses| pre_commit_ci
  Layer1 -->|uses| github_action
  Layer2 -->|uses| pre_commit_manager
  Layer1 -->|uses| detect_secrets
  Layer2 -->|uses| detect_secrets
  Layer3 -->|uses| detect_secrets

  style SolutionStructure fill:#EC7063,stroke:#666,stroke-width:4px
  style SecretsDetectionApproach fill:#AF7AC5,stroke:#333,stroke-width:2px
  style Tools fill:#48C9B0,stroke:#333,stroke-width:2px
  style LayerDetails fill:#F1948A,stroke:#333,stroke-width:2px
  style Compatible_with_all_local_machines fill:#58D68D,stroke:#333,stroke-width:2px
  style Protection_for_main_branch fill:#58D68D,stroke:#333,stroke-width:2px
  style Error_notifications_via_GitHub_email fill:#58D68D,stroke:#333,stroke-width:2px
  style Implemented_with_detect_secrets_workflow fill:#58D68D,stroke:#333,stroke-width:2px
  style Optional_due_to_compatibility_issues fill:#58D68D,stroke:#333,stroke-width:2px
  style Early_stage_secrets_detection fill:#58D68D,stroke:#333,stroke-width:2px
  style Commit_prevention_and_error_messages fill:#58D68D,stroke:#333,stroke-width:2px
  style Implemented_with_pre_commit_manager fill:#58D68D,stroke:#333,stroke-width:2px
  style Direct_use_of_detect_secrets fill:#58D68D,stroke:#333,stroke-width:2px
  style Error_messages_for_detected_secrets fill:#58D68D,stroke:#333,stroke-width:2px
end

User Workflow Diagram

flowchart TB
  User([fa:fa-user User])

  subgraph UserWorkflow["User Workflow to Secure Secrets"]
    Layer1["1. Layer 1: GitHub.com (server-side)"]
    Layer2["2. Layer 2: Git commit scan (client-side)"]
    Layer3["3. Layer 3: Full scan (client-side)"]

    Layer1 -->|If Secrets Detected| Clean1[Purge or Fix the commit manually]
    Layer2 -->|If Secrets Detected| Clean2[Clean local file directly. <br> Don't need to worry about cleaning commit history]
    Layer3 -->|If Secrets Detected| Clean3[Clean local file directly.]

    Secure["Only Main branch is in safe. <br> Secrets are leaked on other branch before cleaning"]
    Clean1 --> Secure

    SaveTime["It saves your time. And secrets are safe from GitHub"]
    Clean2 --> SaveTime
    Clean3 --> SaveTime
  end

  User -->|At least use| Layer1
  User -->|Helpful to use| Layer2
  User -->|Optional to use| Layer3

  style User fill:#F6F5F3,stroke:#333,stroke-width:1px
  style UserWorkflow fill:#AF7AC5,stroke:#333,stroke-width:2px
  style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
  style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
  style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
  style Clean1 fill:#5A88ED,stroke:#333,stroke-width:2px
  style Clean2 fill:#5A88ED,stroke:#333,stroke-width:2px
  style Clean3 fill:#5A88ED,stroke:#333,stroke-width:2px
  style SaveTime fill:#5ABF9B,stroke:#333,stroke-width:2px
  style Secure fill:#AF3034,stroke:#333,stroke-width:2px

Documentation

Solution Structure Diagram

  1. Secrets Detection Approach

    • Layer 1: Push to GitHub.com from the client (server-side)
    • Layer 2: Scan of updated code upon Git commit (client-side)
    • Layer 3: Full scan of the existing code base (client-side)
  2. Tools:

  3. Layer Details

    • Layer 1: Server-side push to GitHub.com
      • Compatible with all local machines
      • Protection for the main branch
      • If secrets detected,
        • Error notifications (including guideline of fix/ purge) via GitHub and email
      • Implemented with .github/workflows/detect-secrets.yml, pre-commit.ci, and detect-secrets
    • Layer 2: Client-side scan upon Git commit
      • May have compatibility issues
      • Early-stage secrets detection
      • If secrets detected,
        • Commit prevention and error messages
      • Implemented with pre-commit manager and .pre-commit-config.yaml using detect-secrets
    • Layer 3: Full scan of existing code base

User Workflow Diagram

  1. User Interaction with Layers:

    • Layer 1: GitHub.com (server-side)
      • The user should at least use this layer for securing secrets.
    • Layer 2: Git commit scan (client-side)
      • Using this layer is helpful for the user to detect secrets early on.
    • Layer 3: Full scan (client-side)
      • This layer is optional for the user to use for additional security.
  2. Actions to be taken if secrets are detected:

    • Layer 1: Purge or fix the commit manually
      • If secrets are detected, the user must purge or fix the commit manually to ensure the main branch remains secure.
    • Layer 2: Clean local file directly
      • If secrets are detected, the user can clean the local file directly, without worrying about cleaning the commit history.
    • Layer 3: Clean local file directly
      • If secrets are detected, the user can clean the local file directly.
  3. Effects of using different layers:

    • Using Layer 1 ensures that only the main branch is safe, and secrets are leaked on other branches before cleaning.
    • Using Layer 2 and Layer 3 saves the user's time and keeps secrets safe from GitHub.
riverma commented 1 year ago

Hi @perryzjc -

Excellent work here with the research and brining this all together. I support your plan here, but I have a couple questions and suggestions.

Questions:

Suggestions:

perryzjc commented 1 year ago

Hi Rishi, here is my response to the Questions:

  1. Developers will be notified by email sent from GitHub Action.

  2. detect-secrets has a plug-in folder We can add features (other patterns) by putting or modifying the Python scripts in that folder. It's scalable and convenient.

  3. The newest release of detect-secrets was on October 5, 2022. It's much more active than git-secrets, whose latest update was three years ago.

  4. Also, detect-secrets is an enterprise-friendly way of detecting and preventing secrets in code. Currently, I've found that IBM and Yelp are using it. I've also found that it is recommended by Microsoft

Also, thank you for the Suggestions! When it comes to the implementation, I will try to complete those features and make them as convenient and secure as possible.

perryzjc commented 1 year ago

Here are three sequence diagrams to help people better understand the three layers.

Layer1 - Server-side push to GitHub.com

sequenceDiagram
    participant User as Developer
    participant GH as GitHub
    participant Config as .pre-commit-config.yaml
    participant CI as Pre-commit CI
    participant DS as Detect-Secrets

    Note over User,GH: Developer creates pull request or pushes to branch
    User->>+GH: Creates pull request / pushes to branch
    GH->>+Config: Fetches pre-commit config
    Config->>CI: Returns config with Detect-Secrets setup
    CI->>DS: Requests secret scan
    DS->>DS: Scans pull request / branch for secrets with custom plugins
    alt Secrets Detected
        DS-->>CI: Returns detected secrets
        CI-->>GH: Reports status check as failed
        GH-->>User: Prevents merge / push & reports status check
    else No Secrets Detected
        DS-->>CI: Returns clean result
        CI-->>GH: Reports status check as passed
        GH-->>User: Allows merge / push
    end

Layer2 - Git commit scan (client-side)

sequenceDiagram
    participant User as Developer
    participant Local as Local Environment
    participant Config as .pre-commit-config.yaml
    participant PCH as Pre-commit Hook
    participant DS as Detect-Secrets
    participant File as Baseline File

    Note over User,Local: Developer attempts to commit
    User->>+Local: Request commit
    Local->>+Config: Fetches pre-commit config
    Config->>PCH: Returns config with Detect-Secrets setup
    PCH->>DS: Request secret scan with existing baseline
    DS->>File: Fetches baseline file
    File->>DS: Returns baseline file
    DS->>DS: Scans changes for secrets with custom plugins
    alt New Secrets Detected
        DS-->>PCH: Returns detected secrets
        PCH-->>Local: Prevents commit & reports detected secrets
        Local-->>User: Prevents commit & reports detected secrets
    else No New Secrets Detected
        DS-->>PCH: Returns clean result
        PCH-->>Local: Allows commit
        Local-->>User: Commits changes
    end

Layer3 - Full scan and audit (client-side)

sequenceDiagram
    participant Dev as Developer
    participant Env as Local Environment
    participant DS as Detect-Secrets
    participant File as Baseline File
    participant Audit as Audit Tool

    Note over Dev,Env: Developer initiates a direct scan for secrets
    Dev->>+Env: Triggers direct scan
    Env->>+DS: Requests scan on the codebase
    DS->>DS: Performs secret scanning
    DS->>File: Generates new baseline file
    File->>DS: Acknowledges file creation
    DS-->>-Env: Returns scan results and new baseline file
    Env-->>Dev: Presents scan results and new baseline file
    Note over Dev,File: Developer may audit the new baseline file
    Dev->>Audit: Initiates audit on the new baseline file
    Audit->>File: Fetches details from the baseline file
    File->>Audit: Returns secret details
    Audit-->>Dev: Presents detailed information of detected secrets
riverma commented 1 year ago

@perryzjc has proposed his plugins as PR's to Yelp's Detect Secrets core codebase here. Once those are accepted, we may no longer need to host a separate fork of detect secrets in the future.

riverma commented 1 year ago

Resolved.