OpenCTI-Platform / client-python

OpenCTI Python Client
https://www.opencti.io
Apache License 2.0
113 stars 128 forks source link

Self-referencing reports cause a recursion error #267

Closed rlynch-ironnet closed 2 years ago

rlynch-ironnet commented 2 years ago

Description

When ingesting from the mandiant api v4 stix/taxii report endpoint, a recursion error happens. At a glance, without digging too deep into the bundle splitting, it appears to be because of the self referencial reports they send back.

Environment

  1. OS (where OpenCTI server runs): local laptop, mac
  2. OpenCTI version: pycti 5.3.7
  3. Other environment details: n/a

Reproducible Steps

>>> objects
[{'type': 'report', 'spec_version': '2.1', 'id': 'report--3fa375fc-085b-5296-a6bb-7901d831d5e2', 'created': '2022-08-12T18:54:35.505Z', 'modified': '2022-08-12T18:54:35.505Z', 'object_marking_refs': ['marking-definition--f88d31f6-486f-44da-b317-01333bde0b82'], 'name': 'Hackers Behind Cuba Ransomware Attacks Using New RAT Malware', 'report_types': ['News Analysis'], 'published': '2022-08-12T18:54:35.505Z', 'object_refs': ['report--3fa375fc-085b-5296-a6bb-7901d831d5e2'], 'extensions': {'extension-definition--56495771-1c8b-43ad-a731-c6b6cc1b8f6d': {'extension_type': 'toplevel-property-extension'}}, 'created_by_ref': 'identity--0a225431-f1d7-5e77-99fc-6f5d392b92d9', 'x_opencti_files': [{'name': '22-00019054.pdf', 'data': 'abc123', 'mime_type': 'application/pdf'}]}]

>>> conn._helper.send_stix2_bundle(bundle=stix2.Bundle(objects=objects, allow_custom=True).serialize(), update=True, work_id="abc123")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/raymond.lynch/.virtualenvs/mandiant-zruo/lib/python3.10/site-packages/pycti/connector/opencti_connector_helper.py", line 778, in send_stix2_bundle
    bundles = stix2_splitter.split_bundle(bundle, True, event_version)
  File "/Users/raymond.lynch/.virtualenvs/mandiant-zruo/lib/python3.10/site-packages/pycti/utils/opencti_stix2_splitter.py", line 67, in split_bundle
    self.enlist_element(item["id"], raw_data)
  File "/Users/raymond.lynch/.virtualenvs/mandiant-zruo/lib/python3.10/site-packages/pycti/utils/opencti_stix2_splitter.py", line 22, in enlist_element
    nb_deps += self.enlist_element(element_ref, raw_data)
  File "/Users/raymond.lynch/.virtualenvs/mandiant-zruo/lib/python3.10/site-packages/pycti/utils/opencti_stix2_splitter.py", line 22, in enlist_element
    nb_deps += self.enlist_element(element_ref, raw_data)
  File "/Users/raymond.lynch/.virtualenvs/mandiant-zruo/lib/python3.10/site-packages/pycti/utils/opencti_stix2_splitter.py", line 22, in enlist_element
    nb_deps += self.enlist_element(element_ref, raw_data)
  [Previous line repeated 2986 more times]
  File "/Users/raymond.lynch/.virtualenvs/mandiant-zruo/lib/python3.10/site-packages/pycti/utils/opencti_stix2_splitter.py", line 14, in enlist_element
    existing_item = self.cache_index.get(item_id)
RecursionError: maximum recursion depth exceeded while calling a Python object

Steps to create the smallest reproducible scenario:

Expected Output

Handles it properly.

Actual Output

Recursion error

Additional information

rlynch-ironnet commented 2 years ago

I suppose I'll have to do something similar to how the old sekoia issues fixed things.

rlynch-ironnet commented 2 years ago

reproducible snippet:

bundle_objects = [{
    'type': 'report',
    'spec_version': '2.1',
    'id': 'report--3fa375fc-085b-5296-a6bb-7901d831d5e2',
    'created': '2022-08-12T18:54:35.505Z',
    'modified': '2022-08-12T18:54:35.505Z',
    'object_marking_refs': ['marking-definition--f88d31f6-486f-44da-b317-01333bde0b82'],
    'name': 'abc123',
    'report_types': ['News Analysis'],
    'published': '2022-08-12T18:54:35.505Z',
    'object_refs': ['report--3fa375fc-085b-5296-a6bb-7901d831d5e2']
}]
import pycti.utils.opencti_stix2_splitter
splitter = pycti.utils.opencti_stix2_splitter.OpenCTIStix2Splitter()
import stix2
bundle = stix2.Bundle(objects=bundle_objects, allow_custom=True).serialize()
splitter.split_bundle(bundle)
richard-julien commented 2 years ago

Looks like we forget to check this kind of cycling dependency. We need to check that correctly and i think we will continue to reject this kind of data. We can maybe cleanup the cyclic ref but it will be difficult to take the correct decision in every situation

daemitus commented 2 years ago

what if we wrap the enlist_element method inside another that just keeps a list of processed entity IDs? The recursion can happen either in the inner method?

Something like...

def enlist_element(...):
    tracked = set()
    def inner_enlist(id, ...):
         if id in tracked:
             return
         tracked.add(id)
         ...stuff...
         inner_enlist(recursive args)
richard-julien commented 2 years ago

Finally we decide to rewrite the bundle removing this kind of cyclic problem. So in your bundle sample, the report will not have anymore it self in the object_refs.

Dont hesitate to see with your data provider to really cleanup the bundle on their side to improve data quality overall :)