InQuest / ThreatIngestor

Extract and aggregate threat intelligence.
https://inquest.readthedocs.io/projects/threatingestor/
GNU General Public License v2.0
832 stars 135 forks source link

Troubleshoot missing extracted indicators #162

Open dspruell-i01 opened 4 weeks ago

dspruell-i01 commented 4 weeks ago

This issue is marked as blocking since automated OSINT collection is partially broken, potentially to a significant level. It has been for some time (initially encountered nearly two years ago or longer).

We notice that some OSINT collected indicators are not currently showing up in our intelligence data stores. Historically, we've noted this as well. The historical observation is that on numerous occasions, checking for indicators we know are present in RSS or Sitemap feeds we collect do not show up in our collections. A previous Engineering resource, Trevor, started diagnosing this and reported seeing indicators being extracted but never reaching outputs. In this case, we see that indicators do not appear to be extracted (and therefore would not be seen in outputs, but should be).

  1. Source RSS feeds are validated as functioning and added to ThreatIngestor's configuration.
  2. ThreatIngestor runs and extracts indicators from feed sources, as validated in ThreatIngestor logs.
  3. Expected indicators do not appear in threat intel data stores, as verified using lookups against our API services to query C2 Feed, IOCDB, REPDB, and TIDB.

Example

Source blog post:

2024-10-22 https://blog.talosintelligence.com/gophish-powerrat-dcrat/

Feed is configured in ThreatIngestor:

- name: rss-talos
  module: rss
  url: http://feeds.feedburner.com/feedburner/Talos
  feed_type: messy

Feed is verified to be functional, and the target post is found in the feed content:

image.png

This sample indicator is listed in the post:

94[.]103[.]85[.]47 (94.103.85.47)

This would be extracted, defanged and sent to configured output(s) by ThreatIngestor.

We have verified that extraction from the configured feed is historically functional:

$ grep "'rss-talos'" /opt/research/logs/threatingestor_rss.log* |grep Processing |grep -v '0 artifacts'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.145 | DEBUG    | threatingestor.sources:process_element:56 - Processing in source 'rss-talos'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.229 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'threatkb-yara'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.229 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'threatkb-c2'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.229 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'threatkb-task'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.229 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'url-processor'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.229 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'csv'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.230 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'sqlite'
/opt/research/logs/threatingestor_rss.log.4:2024-10-10 19:39:15.266 | DEBUG    | threatingestor:run_once:139 - Processing 12 artifacts from source 'rss-talos' with operator 'aurora'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.294 | DEBUG    | threatingestor.sources:process_element:56 - Processing in source 'rss-talos'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.315 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'threatkb-yara'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.315 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'threatkb-c2'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.315 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'threatkb-task'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.315 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'url-processor'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.316 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'csv'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.316 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'sqlite'
/opt/research/logs/threatingestor_rss.log.5:2024-10-09 17:08:34.329 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'aurora'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.489 | DEBUG    | threatingestor.sources:process_element:56 - Processing in source 'rss-talos'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.509 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'threatkb-yara'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.510 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'threatkb-c2'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.510 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'threatkb-task'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.510 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'url-processor'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.510 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'csv'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.510 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'sqlite'
/opt/research/logs/threatingestor_rss.log.5:2024-10-10 10:53:47.524 | DEBUG    | threatingestor:run_once:139 - Processing 1 artifacts from source 'rss-talos' with operator 'aurora'

...However note that the above logs show this extraction has only been performed most recently on 2024-10-09 and 2024-10-10. The post was published 2024-10-22. There has been no extraction performed for this configured source since publication (2024-10-22 or 2024-10-23, the day this issue is being reported).

Looking at the configured outputs, it appears that the _rsstalos source is configured to output to ThreatKB (C2 Feed):

- name: threatkb-c2
  module: threatkb
  credentials: threatkb-auth
  artifact_types: [Domain, IPAddress]
  # yamllint disable-line
  allowed_sources: [twitter-list-inquest-ioc-feed, rss-paloaltonetworks, rss-talos, rss-securelist, rss-fireeye]
  use_https: true

However, we can confirm that when we checked ThreatKB for the target indicator (94.103.85.47), it was not found. We added it manually.

We can also confirm that the indicator did not get ingested from this source and routed to a different indicator store. When queried 2024-10-23, the indicator was only ingested from another source (Recorded Future) and routed to TIDB. It was not collected from the Talos feed.

dspruell-i01 commented 3 weeks ago

Clarifying notes from external discussion:

ThreatIngestor logs make it seem like no extraction was performed for that source at all when/after it was published. Based on the logs, we're led to believe that no extraction was performed by ThreatIngestor - not that the indicators were extracted and failed to be ingested.

@pedramamini asked this:

The IOCs are not IN the feed correct? threatingestor never went out to the site to fetch the content. this isn't a bug, but rather a feature need.

Response:

@pedramamini asked this:

How are you loading this site? https://feeds.feedburner.com/feedburner/Talos it has a bunk SSL cert.

Response: