InQuest / ThreatIngestor

Extract and aggregate threat intelligence.
https://inquest.readthedocs.io/projects/threatingestor/
GNU General Public License v2.0
821 stars 135 forks source link

Parse images with OCR for further IOC extraction. #69

Closed pedramamini closed 1 year ago

pedramamini commented 5 years ago

Consider the following Tweets:

Which contain the following image URLs:

Retrieve the image, run through a cloud OCR (Google, Facebook, AWS), then parse with IOCExtract for inclusion in the IOC stream.

rshipp commented 5 years ago

This would be a perfect candidate for a new queue worker - you could probably do it with just a few lines difference from the paste processor.

pedramamini commented 1 year ago

Here's a more modern example of why this is valuable:

https://www.sentinelone.com/blog/top-10-macos-malware-discoveries-in-2022/ Image

battleoverflow commented 1 year ago

In the next version of ThreatIngestor, this can be accomplished with a new source specifically for image extraction. This does require some /tmp data to live on the system due to how CV handles the binary data from images, but it should work for both local and external images.

config.yml

sources:
  - name: image-scrape
    module: image
    img: local.jpg

  - name: image-scrape
    module: image
    img: https://user-images.githubusercontent.com/1253573/210873147-a8fdbc59-2bbf-4c56-af6d-d01503aabb93.png

Command

threatingestor config.yml