certtools / intelmq

IntelMQ is a solution for IT security teams for collecting and processing security feeds using a message queuing protocol.
https://docs.intelmq.org/latest/
GNU Affero General Public License v3.0
975 stars 296 forks source link

Output bot to MISP using MISP API #1483

Closed bernhardreiter closed 4 years ago

bernhardreiter commented 4 years ago

Split out from #834 the goal here is to have a basic output bot that allows IntelMQ events to go into MISP via the MISP API.

The currently existing https://github.com/certtools/intelmq/tree/develop/intelmq/bots/outputs/misp goes via the feed input of MISP.

quoting @wagner-certat:

If the MISP Feed Output is not sufficient for your use-case, the question is which additional steps you want to accomplish your workflow. In any case, you need to create MISP objects (intelmq events) with MISP attributes (intelmq fields) like in the MISP feed output. The MISP objects needs to be the child of an MISP event (again the question what you need here - for the MISP feed output we decided that a time-based iteration is a good solution, could also be a fixed event or something based on a search query) and can has a template, like the IntelMQ Template for MISP Objects. Then add the event with pymisp.ExpandedPyMISP(...).add_object(event, object, ...)

bernhardreiter commented 4 years ago

@pettai this is the issue to document the extended use cases, which cannot be solved by the existing MISPFeedOutputBot.

My current understanding is that the API approach will offer more flexibility and and additional possibilities to model a workflow. I'll start with a basic output bot using the api via pymisp under this assumption. Clarifications and details for the workflow are welcome anytime. ;)

Rafiot commented 4 years ago

I really want to stress out that a module adding IntelMQ events directly into MISP need to have serious filter configuration in place, or it will add way too many entries in MISP: It is not realistic to add 100k events a day in a single MISP event.

bernhardreiter commented 4 years ago

@Rafiot what would be a useful, sensible upper limit?

ghost commented 4 years ago

@Rafiot what would be a useful, sensible upper limit?

You could also disable correlation for the data you insert

pettai commented 4 years ago

So, the way we use MISP today and push in data via the API, is that we create new events per push (we do not open a per-day event and add attributes/objects in there for every update). We also do a pre-check that the new data isn't in MISP already, to avoid inserting multiple events with the same data. I understand that the feeds works differently, creating a per-day event and append objects/attributes over the day each time it's updated. Except for minimizing the amount of events, I don't see advantages with this approach. But I guess it's a matter of how much data you process... And so far I haven't seen a problem with our (former) approach yet...

That's also how I imagined that the API Output should work too, creating new events every time it's getting updates (and possible (pre-)check if another event with the same data already exists).

Rafiot commented 4 years ago

If you do not have correlation and a small amount of data, it is not a problem. An event with more that 10.000 attributes is generally not useful, but there are plenty of exception.

The advantage is that it will not clutter MySQL, which can be a problem if you have 100K events a day.

bernhardreiter commented 4 years ago

Implementation is picking up progress after the end-of-the-year transition.

Code will land in https://github.com/Intevation/intelmq/tree/dev-output-bot-misp-api base of certtools:develop branch, aiming for the next intelmq release.

Small development considerations

todos

bernhardreiter commented 4 years ago

Use case idea(s)

DNS defense

One use case is that for purposes of creating a DNS blocking for certain domains (#1466), several feeds from IntelMQ will check with the MISP instance if the domain is already reported and if not, create a new Event, otherwise (optionally) add or update an attribute with the time and the feed where it was reported last.

Example events

There are a number of interesting feeds that could provide examples in intelmq's list of https://github.com/certtools/intelmq/blob/develop/docs/Feeds.md

To keep development simple we first look at the default example feeds coming with a fresh intelmq installation, currently this is the directory https://github.com/certtools/intelmq/tree/76769e2e8c3b8bd159c867497d7201f74835cb83/intelmq/etc Testing shows that only the https://malc0de.com/bl/BOOT produces events that have enough information to directly be used for DNS blocking as they have source.fqdn values. Events coming from http://www.malwaredomainlist.com/updatescsv.php seem to be too old to be useful.

More examples could be added by other feeds or by adding a reverse DNS export bot and using the resulting source.reverse_dns value.

Implementation notes/ideas

First events shall be filtered to have at least one of source.fqdn or source.reverse_dns before going into a MISP API output bot. The bot shall connect to the MISP instance and search for existing entries for these domains.

bernhardreiter commented 4 years ago

The development branch https://github.com/Intevation/intelmq/tree/dev-output-bot-misp-api has a first simple bot now, that will insert all intelmq events unconditionally into a MISP instance, using the intelmq_event MISP object template (coming with modern version of MISP).

https://github.com/Intevation/intelmq/blob/893a5b7de04efd391338c7e3c04910add5e50e34/intelmq/bots/outputs/misp/output_api.py

It can be tested on a current default development install of intelmq with the default configuration:

Result: intelmqctl log misp-api-output will show one line per inserted misp event (when INFO level logging is set).

bernhardreiter commented 4 years ago

https://github.com/Intevation/intelmq/commit/00a970e980cffcbc93d78e0aa410b50e249850ab is an improved bot that uses the concept of significant fields, when used with

"significant_fields": ["source.fqdn", "source.reverse_dns"]

it be be close to the use case for #1466. But it is still quite general.

If the values of the significant fields are all found in MISP attributes created with the same tag, the event is not inserted again. Actions are logged as INFO.

bernhardreiter commented 4 years ago

Testing the development version

When installing from scratch, you can checkout

git clone https://github.com/Intevation/intelmq.git 
git checkout dev-output-bot-misp-api

and then follow standard instructions. Make sure you also install pymisp.

Then configure the bot, here is a complete runtime.conf section example, geared towards using domain names later in MISP:

    ,
    "misp-api-output": {
        "bot_id": "misp-api-output",
        "description": "Testing the inserting of events into MSIP",
        "enabled": true,
        "group": "Output",
        "groupname": "outputs",
        "module": "intelmq.bots.outputs.misp.output_api",
        "name": "MISP API Output",
        "parameters": {
           "misp_url": "https://your-misp-instance.example.org:1234/",
           "misp_key": "YOUR MISP API KEY",
           "misp_tag_for_bot": "Inserted-by:IntelMQ",
           "significant_fields": ["source.fqdn", "source.reverse_dns"],
           "http_verify_cert": false
        },
        "run_mode": "continuous"
    }

then wire the bot in the intelmqbot net (using the IntelMQ manager or by editing pipeline.conf). Make sure it does not get too many events as MISP (on the average and by its design goals) is not designed to handle as many events as fast as intelmq does. When using the example bots, one way is that you can disable spamhaus an feodo-tracker collector bots (see above).

bernhardreiter commented 4 years ago

Testing on the current alpha version of the bot was done. Here are the plans for improvement:

Additional MISP tags.

We introduce a new configuration parameter for tags that will be set in new MISP events, and set the default or example configuration to OSINT. Because intelmq can also handle other than public (aka open sources) feeds. And there maybe other interesting MISP tags to set depending on what you pipe into the bot.

misp_additional_tags: list of tags to set in addition to misp_tag_for_bot which will not be searched for when looking for duplicates

Explicit handling of to_ids flags

The to_ids attribute will be set to False for all attributes by default an only enabled if the field is found to be in the new configuration parameter:

misp_to_ids_fields: list of fields for which the to_ids flags will be set to the MISP attributes.

Rationale: We expect it to be a deliberate decision which fields are flagged for automation by the MISP operators, so it should not be set automatically.

Additional correlation attributes

` misp_additional_correlation_fields': list of field for which the correlation flags will be enabled (in addition to those which are already enabled because they are significant_fields).

It is likely that there are attributes which should not be checked for duplicates of insertion, but correlated, e.g. asn or network.

Option to add feed.provider as tag

The feed can be interesting, so we add an option:

add_feed_provider_as_tag: Boolean, Default True

bernhardreiter commented 4 years ago

Any idea what tag to use for adding the feed provider? https://github.com/MISP/misp-taxonomies does not seem to suggest any namespace or predicate. What about IntelMQ:feed.provider="XXX" where XXX gets replaced with intelmq_event["feed_prodiver"]?

bernhardreiter commented 4 years ago

new example part of runtime.conf with the new parameters:

    ,
    "misp-api-output": {
        "bot_id": "misp-api-output",
        "description": "Testing the inserting of events into MSIP",
        "enabled": true,
        "group": "Output",
        "groupname": "outputs",
        "module": "intelmq.bots.outputs.misp.output_api",
        "name": "MISP API Output",
        "parameters": {
           "add_feed_provider_as_tag": true,
           "misp_additional_correlation_fields": ["source.ip", "source.asn", "source.network"],
           "misp_additional_tags": ["OSINT", "osint:certainty==\"90\""],
           "misp_url": "https://your-misp-instance.example.org:1234/",
           "misp_key": "YOUR MISP API KEY",
           "misp_tag_for_bot": "Inserted-by:IntelMQ",
           "misp_to_ids_fields": ["source.fqdn", "source.reverse_dns"],
           "significant_fields": ["source.fqdn", "source.reverse_dns"],
           "http_verify_cert": false
        },
        "run_mode": "continuous"
    }
bernhardreiter commented 4 years ago

Two smaller improvements with db05994455e34d214179c668b72c73b81d40b22c and 93539b8b32c859ad4a045222503bcb57884833c6 . Here is a new example config:

    ,
    "misp-api-output": {
        "bot_id": "misp-api-output",
        "description": "Testing the inserting of events into MSIP",
        "enabled": true,
        "group": "Output",
        "groupname": "outputs",
        "module": "intelmq.bots.outputs.misp.output_api",
        "name": "MISP API Output",
        "parameters": {
           "add_feed_provider_as_tag": true,
           "misp_additional_correlation_fields": ["source.ip", "source.asn", "source.network"],
           "misp_additional_tags": ["OSINT", "osint:certainty==\"90\""],
           "misp_url": "https://your-misp-instance.example.org:1234/",
           "misp_key": "YOUR MISP API KEY",
           "misp_publish": false,
           "misp_tag_for_bot": "Inserted-by:IntelMQ",
           "misp_to_ids_fields": ["source.fqdn", "source.reverse_dns"],
           "significant_fields": ["source.fqdn", "source.reverse_dns"],
           "http_verify_cert": false
        },
        "run_mode": "continuous"
    }

Ramping up for a pull request

bernhardreiter commented 4 years ago

Ramping up for a pull request

With 3ecf7464af0a64c90fd27698fd601582e91d35c9 and fff4db6d0a34338cb88ab8d8a8a3aaf536618779 we are ready

bernhardreiter commented 4 years ago

The bot is submitted for inclusion to upcoming IntelMQ 2.2.0 release.

Here is a brief version you can install it with the 2.1.x with the native packages (tested on Debian GNU/Linux Buster with intelmq 2.1.2-1.)

pip3 install pymisp

# get the new bot or bot version. In this case the whole subdir `outputs/misp`  is missing
# so we add it to the place where the .deb package has placed the other files.
cp -r misp/ /usr/lib/python3/dist-packages/intelmq/bots/outputs/

# create the necessary setuptools script entry, by copying one of the existing ones
sed -e 's/sql\.output/misp.output_api/' /usr/bin/intelmq.bots.outputs.sql.output >/usr/bin/intelmq.bots.outputs.misp.output_api
chmod a+x /usr/bin/intelmq.bots.outputs.misp.output_api

# make the entry point known to python's setuptools
echo intelmq.bots.outputs.misp.output_api = intelmq.bots.outputs.misp.output_api:BOT.run >>/usr/lib/python3/dist-packages/intelmq-2.1.2.egg-info/entry_points.txt

# cater for missing new features in 2.1.2
cat <<EOF | patch /usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py
--- misp/output_api.py  2020-02-21 10:02:51.824693139 +0100
+++ /usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py     2020-02-21 10:43:01.988666756 +0100
@@ -58,2 +58,2 @@
-from intelmq.lib.bot import OutputBot
-from intelmq.lib.exceptions import MissingDependencyError
+from intelmq.lib.bot import Bot
+#from intelmq.lib.exceptions import MissingDependencyError

@@ -67 +67 @@
-class MISPAPIOutputBot(OutputBot):
+class MISPAPIOutputBot(Bot):
@@ -72 +72 @@
-            raise MissingDependencyError('pymisp', version='>=2.4.120')
+            raise RuntimeError('Needs pymisp in version>=2.4.120.')
EOF

Also add an entry to BOTS if you want the new bot to show up in the IntelMQ Manager.

bernhardreiter commented 4 years ago

One defect: There are some events where the bot fails with something like

2020-02-22 12:43:09,049 - MISP-API-Output - ERROR - Bot has found a problem.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/intelmq/lib/bot.py", line 267, in start
    self.process()
  File "/usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py", line 104, in process
    self._insert_misp_event(intelmq_event)
  File "/usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py", line 131, in _insert_misp_event
    obj = new_misp_event.add_object(name='intelmq_event')
  File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 1363, in add_object
    **kwargs)
  File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 612, in __init__
    self._set_template(kwargs.get('misp_objects_path_custom'))
  File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 697, in _set_template
    self._known_template = self._load_template_path(self.misp_objects_path / self.name / 'definition.json')
  File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 652, in _load_template_path
    self._definition: Union[dict, None] = self._load_json(template_path)
  File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/abstract.py", line 50, in _load_json
    data = load(f)
  File "/usr/lib/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1680: ordinal not in range(128)

Saw this once when a html collector bot was missconfigured. Need a way to reproduce and get an example event. Expectation: the bot should handle problematic intelmq events, if they can occurr.

A similar error message has been reported in https://github.com/MISP/PyMISP/issues/504 @Rafiot Any ideas?

It seems to depend on the contents that is given to pymisp, but not directly.

bernhardreiter commented 4 years ago

Char 1680 on ./data/misp-objects/objects/intelmq_event/definition.json is a non-ascii double quote: “type explosion” in the description of classification.type. So maybe if "classification.type" is to be added, the defect in the template or template loading comes up?

ghost commented 4 years ago

Char 1680 on ./data/misp-objects/objects/intelmq_event/definition.json is a non-ascii double quote: “type explosion” in the description of classification.type. So maybe if "classification.type" is to be added, the defect in the template or template loading comes up?

Comes from here: https://github.com/certtools/intelmq/blob/ac1e46bb6946ac07ddc929f3691cb9ab1a1de49f/intelmq/etc/harmonization.conf#L13

Rafiot commented 4 years ago

I think I remember how we fixed that problem last time: https://github.com/docker-library/python/issues/13

Is the locale broken on the machine you're using?

bernhardreiter commented 4 years ago

@Rafiot thanks for the hint! Still trying to understand the full story.

System 1

An ubuntu VM where I only saw the probably temporarily and I had a faulty configuration of the http collector bot - it was missing the URL via the IntelMQ manager. Once I had fixed the configuration, I flushed redis and restarted intelmq and all went fine after this. Today I've inquired and the locale is LANG=C.UTF-8 for all three relevant users. As I don't remember doing something special about locales, it should have been this way all the time, but of course it is possible that I accidentally used LANG=C intelmqctl. I consider it unlikely that it was just a broken locale. The case is not explained - yet.

Idea: After chatting with more engineers during lunch at intevation: Starting intelmq via the IntelMQ manager via apache may lead to a different environment (also checking via ssh). This needs to be checked next.

System 2

This is a report from a system from somebody else. The locale for the intelmq user is also LANG=C.UTF-8. (But here also the IntelMQ manager is in use.)

Conclusion: use of sudo suspected

To complete the story for later reference: the IntelMQ Manager uses sudo, which probably loses the LANG environment. This was consistent with both observations. On the other hand a JSON import should use UTF-8 in any case. (This is how it was done later at part of https://github.com/MISP/PyMISP/issues/504 .) An additional improvement could be to make sure that the call to "sudo intelmqctl" still uses a good locale to avoid surprises from Python.