blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.1k stars 375 forks source link

New Module: GitHub workflow logs #1335

Closed domwhewell-sage closed 2 months ago

domwhewell-sage commented 2 months ago

This PR adds a new module to download workflow logs from a repository as mentioned in https://github.com/blacklanternsecurity/bbot/issues/1305.

It will always try all workflows in the repository and by default 1 successful log is downloaded for each and you can specify num_logs up to a maximum of 100 logs for each workflow.

It raises FILESYSTEM events for the downloaded workflow logs archive.

The plan is to run trufflehog against these archives but first I want to double check trufflehog runs against them without loads of duplicates (Unzipping the archive manually there's a large logfile and smaller logfile "chunks" that seem to duplicate the content of the largelog)

bbot -t blacklanternsecurity.com -m github_org, github_workflows --config modules.github_org.api_key=<api_token>

domwhewell-sage commented 2 months ago

Marking this for review now, unfortunately trufflehog is duplicating the discovered keys as technically they are within 2 different "lines" of the zip file probably something to think about in our trufflehog module if we could do an internal set() to de-dupe discovered secrets.

Also the repo I used to test this on had an aws_access_key, aws_secret_access_key and aws_session_token in the workflow log and trufflehog wasn't picking it up so that's a bug I'll have to pickup with the developers of that tool.

Finally the "location" a discovered secret would be found is a run_XXXXXXX.zip file which obviously will mean nothing to the user of bbot so we would need some way of linking this to the original CODE_REPOSITORY event. (https://github.com/blacklanternsecurity/bbot/issues/1319 ?) Theoretically: CODE_REPOSITORY -> FILESYSTEM -> FINDING.

Nothing to change in this module but all things to think about for the trufflehog module changes required to make this module yield secrets

TheTechromancer commented 2 months ago

For now can we add a description to the FILESYSTEM event that says something like, "these are logs from the GitHub workflow <workflow> on <repo> at <time>"?

TheTechromancer commented 2 months ago

Nice work on this! I made a small tweak to the error handling, let me know if it looks good.

domwhewell-sage commented 2 months ago

I've made a modification to prevent the duplication as the downloaded zip archive contains a structure like

allsteps.txt
folder/
  - step1.txt
  - step2.txt

Therefore a secret could be in allsteps.txt and step2.txt which would make trufflehog raise the finding 2x for the same secret