blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.59k stars 415 forks source link

Content Search Module #1367

Closed SpamFaux closed 3 months ago

SpamFaux commented 4 months ago

Description Which feature would you like to see added to BBOT? What are its use cases?

A Content Search Module would be used to identify specific strings of data within scanned websites. Ideally this would be used by passing a REGEX query to the module to identify the content.

The ideal output would either be a specified event type or a to tag with a specified value.

There should also be a means for the module to accept multiple REGEX queries paired with the preferred output for that REGEX match.

domwhewell-sage commented 4 months ago

I think a FINDING event would be more feasible with a custom tag in the current framework

The config could look something like this

modules:
  content_search:
    http_responses: True
    file_contents: True
    regex:
      - custom_tag: "pwd:\s(.*)"
      - another_custom_tag: "password:\s(.*)"

and the output could look something like this

[FINDING]               {"description": "A match was found using the custom regex ['pwd:\s(.*)']", "host": "blah.test.com", "url": "http://blah.test.com/"} httpx->content_search   (in-scope, custom_tag)
TheTechromancer commented 4 months ago

@liquidsec is already hard at work on this in bbot-2.0. Excavate is getting a complete rework using Yara, which will allow us to scale up these kinds of regex searches to a much bigger scale, including searching text extracted from @domwhewell-sage's unstructured module.

A side effect of this new excavate rewrite will hopefully be the ability to load custom Yara rules, which will fulfill the need for a content search module.

TheTechromancer commented 3 months ago

Closing as duplicate of https://github.com/blacklanternsecurity/bbot/issues/1252.