CYBEX-P / cybexp-cs

1 stars 0 forks source link

proc.analytics.filters.filt_misp: KeyError: 'pop from an empty set' and pymongo.errors.AutoReconnect: connection closed #5

Closed qclassified closed 5 years ago

qclassified commented 5 years ago
ERROR:root:proc.analytics.filters.filt_misp.filt_misp.1: MISP Event id 61
Traceback (most recent call last):
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 1038, in _get_socket_no_auth
    sock_info = self.sockets.pop()
KeyError: 'pop from an empty set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "filt_misp.py", line 145, in filt_misp
    j = Misp(raw, backend)
  File "filt_misp.py", line 197, in __init__
    obj = [self.create_att_obj(copy.deepcopy(type_list), v)]
  File "filt_misp.py", line 226, in create_att_obj
    previous = MispAttribute(type_list.pop(0), data)
  File "filt_misp.py", line 161, in __init__
    super().__init__(alias = _ALIAS.get(args[0],[]), *args, **kwargs)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/tahoe-0.0.1-py3.5.egg/tahoe/instance.py", line 142, in __init__
    super().__init__(**kwargs)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/tahoe-0.0.1-py3.5.egg/tahoe/instance.py", line 21, in __init__
    dup = self.duplicate()
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/tahoe-0.0.1-py3.5.egg/tahoe/instance.py", line 158, in duplicate
    return self.backend.find_one(q)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/tahoe-0.0.1-py3.5.egg/tahoe/backend.py", line 24, in find_one
    def find_one(self, query, projection={"_id" : 0}): return self.coll.find_one(query, projection)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/collection.py", line 1262, in find_one
    for result in cursor.limit(-1):
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/cursor.py", line 1189, in next
    if len(self.__data) or self._refresh():
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/cursor.py", line 1104, in _refresh
    self.__send_message(q)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/cursor.py", line 931, in __send_message
    operation, exhaust=self.__exhaust, address=self.__address)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/mongo_client.py", line 1145, in _send_message_with_response
    exhaust)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/mongo_client.py", line 1156, in _reset_on_error
    return func(*args, **kwargs)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/server.py", line 85, in send_message_with_response
    with self.get_socket(all_credentials, exhaust) as sock_info:
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 1004, in get_socket
    sock_info = self._get_socket_no_auth()
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 1041, in _get_socket_no_auth
    sock_info = self.connect()
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 976, in connect
    sock_info.ismaster(self.opts.metadata, None)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 486, in ismaster
    ismaster = IsMaster(self.command('admin', cmd, publish_events=False))
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 584, in command
    self._raise_connection_failure(error)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 745, in _raise_connection_failure
    raise error
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/pool.py", line 579, in command
    unacknowledged=unacknowledged)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/network.py", line 141, in command
    reply = receive_message(sock, request_id)
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/network.py", line 173, in receive_message
    _receive_data_on_socket(sock, 16))
  File "/home/fsadique/.virtualenvs/cybexp/lib/python3.5/site-packages/pymongo/network.py", line 238, in _receive_data_on_socket
    raise AutoReconnect("connection closed")
pymongo.errors.AutoReconnect: connection closed
qclassified commented 5 years ago

Problem:

For large raw data (e.g. misp event with ~2000 attributes) -> KeyError: 'pop from an empty set' followed by pymongo.errors.AutoReconnect: connection closed

Reason: Each tahoe Instance creates their own backend = MongoBackend because of the from tahoe import * line at top of proc.analytics.filters.filt_misp - the following code block in tahoe.instance.Instance is not used:

class Instance():
    backend = get_backend() if os.getenv("_MONGO_URL") else NoBackend()

As os.get_env is defined at the bottom of proc.analytics.filters.filt_misp Instead they use the 1st line from tahoe.instance.Instance.__init__

class Instance():
    backend = get_backend() if os.getenv("_MONGO_URL") else NoBackend()

    def __init__(self, **kwargs):
        if type(self.backend) == NoBackend and os.getenv("_MONGO_URL"): self.backend = get_backend()
        ...

However this is meant to be a fallback for testing only and creates separate backend=MongoBackend for all attributes and since there are ~2000 attributes alone not counting objects, events or session it soon overwhelms the mongodb server raising maximum allowed connections in a pool (maximum connections created from same machine)

Temporary solution: put following code block at beginning of script:

if __name__ == "__main__":
    config = { 
##      "mongo_url" : "mongodb://cybexp_user:CybExP_777@134.197.21.231:27017/?authSource=admin",
                "mongo_url" : "mongodb://134.197.21.231:27017/",
                "mongo_url" : "mongodb://localhost:27017",
        "analytics_db" : "tahoe_db",
                "analytics_db" : "tahoe_demo",
        "analytics_coll" : "instances"
            }
    os.environ["_MONGO_URL"] = config.pop("mongo_url")
    os.environ["_ANALYTICS_DB"] = config.pop("analytics_db", "tahoe_db")
    os.environ["_ANALYTICS_COLL"] = config.pop("analytics_coll", "instances")

and following code block at end of script:

if __name__ == "__main__":
    filt_misp()
qclassified commented 5 years ago

Permanent Preferred Solution:

Don't run proc.analytics.filters.filt_misp or any filt_* directly, use ../analytics.py as an entry point. Then all tahoe Instances will share a common backend and the processing will be much faster for smaller events and will not overwhelm mongodb for large events