certtools / intelmq

IntelMQ is a solution for IT security teams for collecting and processing security feeds using a message queuing protocol.
https://docs.intelmq.org/latest/
GNU Affero General Public License v3.0
967 stars 295 forks source link

Grouping events within IntelMQ #751

Open dmth opened 7 years ago

dmth commented 7 years ago

To discuss:

Currently aggregation cannot be achieved within IntelMQ, it's done in additional components, which depend on the EventDB.

This aggregation is used in order to create notification E-Mails for customers. Yesterday and today we started to collect ideas how this aggregation could be achieved within IntelMQ.

To visualize these ideas, I appended some sketches. The last two are here just for archival purposes, the first two are open for discussion

In general we introduce two new components, A and AggregatorBot which creates Aggregates (basically those are Reports in some way, but for the sake of clarity I will stick with the term Aggregate), and a Ticket-Number-Bot T.

The AggregatorBot is capable of generating Aggregates from Events. All Events matching a certain set of criteria (colored tags next to each event) are collected into one Aggregation. Those Aggregates are stored in some way (for instance in memory) and are forwarded to the next Bot (in this case the Ticket-NumberBot, when a certain condition is met (Aggregation Condition, Time, or other)

Approach One

2auto-planung-20161019_tafel02 The Aggregator has two Output queues: One for Events, One for Aggregations. Events are immediately forwarded to the Event output. Aggregates are created and stored until the aggregation condition was met. To Create a mapping between events within Aggregates and Events in the EventDB, the Aggregator bot needs to create a UniqueID for each event.

Approach Two

2auto-planung-20161019_tafel03 Every Event is stored in the EventDB, before it is forwarded to the Aggregator. This requires an alteration to the OutputBot to enable an Output queue. This would achieve the possibility to add the Database ID to the Event before forwarding it to the Aggregator. Note the difference after the TicketBot: As OutputBots can have an Output now, it's possible to add the "send-at" timestamp from a "Mailoutput" bot to the Aggregation and write this to another Database.

Necessary Steps


Just for the archive: 2auto-planung-20161018_tafel02 2auto-planung-20161019_tafel01

aaronkaplan commented 7 years ago

On 19 Oct 2016, at 17:08, Dustin Demuth notifications@github.com wrote:

To discuss:

Currently aggregation cannot be achieved within IntelMQ, it's done in additional components, which depend on the EventDB.

Yes, and that's intended like that

This aggregation is used in order to create notification E-Mails for customers. Yesterday and today we started to collect ideas how this aggregation could be achieved within IntelMQ.

I think that is a different tool. Part of controlling complexity is to not try to do everything with one tool.

To visualize these ideas, I appended some sketches. The last two are here just for archival purposes, the first two are open for discussion

In general we introduce two new components, A and AggregatorBot which creates Aggregates (basically those are Reports in some way, but for the sake of clarity I will stick with the term Aggregate), and a Ticket-Number-Bot T.

The AggregatorBot is capable of generating Aggregates from Events. All Events matching a certain set of criteria (colored tags next to each event) are collected into one Aggregation. Those Aggregates are stored in some way (for instance in memory) and are forwarded to the next Bot (in this case the Ticket-NumberBot, when a certain condition is met (Aggregation Condition, Time, or other)

Approach One

The Aggregator has two Output queues: One for Events, One for Aggregations. Events are immediately forwarded to the Event output. Aggregates are created and stored until the aggregation condition was met. To Create a mapping between events within Aggregates and Events in the EventDB, the Aggregator bot needs to create a UniqueID for each event.

Approach Two

Every Event is stored in the EventDB, before it is forwarded to the Aggregator. This requires an alteration to the OutputBot to enable an Output queue. This would achieve the possibility to add the Database ID to the Event before forwarding it to the Aggregator. Note the difference after the TicketBot: As OutputBots can have an Output now, it's possible to add the "send-at" timestamp from a "Mailoutput" bot to the Aggregation and write this to another Database.

Necessary Steps

• The Message Format of IntelMQ needs to be extended by a "Aggregation" Object which is capable to store a list of Events and Information about the Aggregation and a directive for the Aggregation • The Event needs to be extended in order to store a ID • The OutputBot might need an alteration to support an Output Queue Just for the archive:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

bernhardreiter commented 7 years ago

There are a number important observations that led to this design proposal:

  1. We expect that most intelmq-users have a need for email outputs. Good email output needs aggregation, because common abuse handling email formats are aggregated (csv lists, x-arf bulk) which makes sense (only one header, a lot of similiar information).
  2. We expect that most intelmq users have a need for one ticket numbers per email they are sending. In addition its imaginable that more decisions are to be taken on aggregations.
  3. intelmq bots and queuing system already provide a technical solution that would be needed for email decisions (and aggregations) as well. Technically events for one email recipient have to be aggregated until they are to be send out.
  4. Aggregation and tickets may be interesting for other output formats in the future (e.g. wait for three more observations until you send out).

Overall I believe that email output and the needed technical aggregation are already part of a typical intelMQ experience and should be technically closely integrated with it. And because there is already technology in place that can solve the problem (bots, connection, queuing), why not use it for the aggregation as well.

dmth commented 7 years ago

In the first approach I wrote:

To Create a mapping between events within Aggregates and Events in the EventDB, the Aggregator bot needs to create a UniqueID for each event.

This ID can be used in order to navigate between Aggregates and Events more easily. Building this without an ID would also be possible, but that would make navigation more complex.

The approaches mentioned here assume a replication of Events into the Aggregates. This duplicates data. The duplication could be omitted by simply storing the Events IDs in the Aggregate an querying a persistent EventStore (i.e. the eventDB) for the correct events, if their information is needed somewhere. Technologies to do so seem to exist: http://json-ld.org/

ghost commented 3 years ago

This will be solved as part of IEP04