certtools / intelmq

IntelMQ is a solution for IT security teams for collecting and processing security feeds using a message queuing protocol.
https://docs.intelmq.org/latest/
GNU Affero General Public License v3.0
972 stars 295 forks source link

Introduce batching to MISP feed output bot #2473

Open arvchristos opened 7 months ago

arvchristos commented 7 months ago

Description

Implementation of batch mode for MISP feed output to resolve slow performance when queue has many events. The existing code is actually prone to performance issues:

The following code is being executed for every event in the queue, making the bot extremely slow as events arrive and feed becomes larger:

feed_output = self.current_event.to_feed(with_meta=False)

with self.current_file.open('w') as f: # File opened for every event
     json.dump(feed_output, f)

feed_meta_generator(self.output_dir)     # Metadata updated on every event     

Motivation

We are trying to create feeds based on Alienvault OTX pulses including thousands of IOCs per day. This is basically not possible with the current MISP feed output bot performance.

Fix

With this MR, batched feed creation is introduced. The user can now configure the batch size using the batch_size parameter. Batch functionality is based on the actual internal queue used from the bot.

Benchmark

On an average server, before this improvement a feed of 8k events required several hours to be created while now requires less than 5 minutes (depends on the available resources).

sebix commented 5 months ago

Store events in a separated redis queue. This will move a responsibility for keeping data away from the bot.

And prevent data loss too.