Closed bernhardreiter closed 4 years ago
@pettai this is the issue to document the extended use cases, which cannot be solved by the existing MISPFeedOutputBot
.
My current understanding is that the API approach will offer more flexibility and and additional possibilities to model a workflow. I'll start with a basic output bot using the api via pymisp under this assumption. Clarifications and details for the workflow are welcome anytime. ;)
I really want to stress out that a module adding IntelMQ events directly into MISP need to have serious filter configuration in place, or it will add way too many entries in MISP: It is not realistic to add 100k events a day in a single MISP event.
@Rafiot what would be a useful, sensible upper limit?
@Rafiot what would be a useful, sensible upper limit?
You could also disable correlation for the data you insert
So, the way we use MISP today and push in data via the API, is that we create new events per push (we do not open a per-day event and add attributes/objects in there for every update). We also do a pre-check that the new data isn't in MISP already, to avoid inserting multiple events with the same data. I understand that the feeds works differently, creating a per-day event and append objects/attributes over the day each time it's updated. Except for minimizing the amount of events, I don't see advantages with this approach. But I guess it's a matter of how much data you process... And so far I haven't seen a problem with our (former) approach yet...
That's also how I imagined that the API Output should work too, creating new events every time it's getting updates (and possible (pre-)check if another event with the same data already exists).
If you do not have correlation and a small amount of data, it is not a problem. An event with more that 10.000 attributes is generally not useful, but there are plenty of exception.
The advantage is that it will not clutter MySQL, which can be a problem if you have 100K events a day.
Implementation is picking up progress after the end-of-the-year transition.
Code will land in https://github.com/Intevation/intelmq/tree/dev-output-bot-misp-api base of certtools:develop branch, aiming for the next intelmq release.
One use case is that for purposes of creating a DNS blocking for certain domains (#1466), several feeds from IntelMQ will check with the MISP instance if the domain is already reported and if not, create a new Event, otherwise (optionally) add or update an attribute with the time and the feed where it was reported last.
There are a number of interesting feeds that could provide examples in intelmq's list of https://github.com/certtools/intelmq/blob/develop/docs/Feeds.md
To keep development simple we first look at the default example feeds coming with a fresh intelmq installation, currently this is the directory https://github.com/certtools/intelmq/tree/76769e2e8c3b8bd159c867497d7201f74835cb83/intelmq/etc Testing shows that only the https://malc0de.com/bl/BOOT produces events that have enough information to directly be used for DNS blocking as they have source.fqdn
values. Events coming from http://www.malwaredomainlist.com/updatescsv.php seem to be too old to be useful.
More examples could be added by other feeds or by adding a reverse DNS export bot and using the resulting source.reverse_dns
value.
First events shall be filtered to have at least one of source.fqdn
or source.reverse_dns
before going into a MISP API output bot. The bot shall connect to the MISP instance and search for existing entries for these domains.
The development branch https://github.com/Intevation/intelmq/tree/dev-output-bot-misp-api has a first simple bot now, that will insert all intelmq events unconditionally into a MISP instance, using the intelmq_event
MISP object template (coming with modern version of MISP).
It can be tested on a current default development install of intelmq with the default configuration:
copy the new BOTS file over
Add your parameter to runtime.conf
,
"misp-api-output": {
"bot_id": "misp-api-output",
"description": "Testing the inserting of events into MSIP",
"enabled": true,
"group": "Output",
"groupname": "outputs",
"module": "intelmq.bots.outputs.misp.output_api",
"name": "MISP API Output",
"parameters": {
"misp_url": "https://adslfjasdfkj.example.org:1234/",
"misp_key": "YOUR KEY",
"misp_tag_for_bot": "Inserted-by:IntelMQ",
"http_verify_cert": false
},
"run_mode": "continuous"
}
connect the bot as additional output
diff -u /opt/dev_intelmq/intelmq/etc/pipeline.conf /opt/intelmq/etc/pipeline.conf·
--- /opt/dev_intelmq/intelmq/etc/pipeline.conf 2020-01-10 12:20:27.282153443 +0100
+++ /opt/intelmq/etc/pipeline.conf 2020-01-23 12:08:41.825428041 +0100
@@ -1,7 +1,8 @@
{
"cymru-whois-expert": {
"destination-queues": [
"file-output-queue"
"file-output-queue",
"misp-api-output-queue" ], "source-queue": "cymru-whois-expert-queue" }, @@ -25,6 +26,9 @@ "file-output": { "source-queue": "file-output-queue" },
"misp-api-output": {
"source-queue": "misp-api-output-queue"
}, "gethostbyname-1-expert": { "destination-queues": [ "cymru-whois-expert-queue"
disable all but a collector that only inserts a few events
intelmqctl disable spamhaus-drop-collector
intelmqctl disable feodo-tracker-browse-collector
intelmqctl disable malware-domain-list-collector
Hint for testing when using redis you can flush all queues and caches with redis-cli FLUSHDB ; redis-cli FLUSHALL
Copy the bot into the place where you have installed intelmq and update the installation (see https://github.com/certtools/intelmq/blob/develop/docs/Developers-Guide.md#update)
First check with intelmqctl check
than restart the network, e.g. intelmqctl start
Result: intelmqctl log misp-api-output
will show one line per inserted misp event (when INFO level logging is set).
https://github.com/Intevation/intelmq/commit/00a970e980cffcbc93d78e0aa410b50e249850ab is an improved bot that uses the concept of significant fields, when used with
"significant_fields": ["source.fqdn", "source.reverse_dns"]
it be be close to the use case for #1466. But it is still quite general.
If the values of the significant fields are all found in MISP attributes created with the same tag, the event is not inserted again. Actions are logged as INFO
.
When installing from scratch, you can checkout
git clone https://github.com/Intevation/intelmq.git
git checkout dev-output-bot-misp-api
and then follow standard instructions. Make sure you also install pymisp.
Then configure the bot, here is a complete runtime.conf section example, geared towards using domain names later in MISP:
,
"misp-api-output": {
"bot_id": "misp-api-output",
"description": "Testing the inserting of events into MSIP",
"enabled": true,
"group": "Output",
"groupname": "outputs",
"module": "intelmq.bots.outputs.misp.output_api",
"name": "MISP API Output",
"parameters": {
"misp_url": "https://your-misp-instance.example.org:1234/",
"misp_key": "YOUR MISP API KEY",
"misp_tag_for_bot": "Inserted-by:IntelMQ",
"significant_fields": ["source.fqdn", "source.reverse_dns"],
"http_verify_cert": false
},
"run_mode": "continuous"
}
then wire the bot in the intelmqbot net (using the IntelMQ manager or by editing pipeline.conf). Make sure it does not get too many events as MISP (on the average and by its design goals) is not designed to handle as many events as fast as intelmq does. When using the example bots, one way is that you can disable spamhaus an feodo-tracker collector bots (see above).
Testing on the current alpha version of the bot was done. Here are the plans for improvement:
We introduce a new configuration parameter for tags that will be set in new MISP events,
and set the default or example configuration to OSINT
. Because intelmq can also handle other than public (aka open sources) feeds. And there maybe other interesting MISP tags to set depending on what you pipe into the bot.
misp_additional_tags
: list of tags to set in addition to misp_tag_for_bot which will not be searched for when looking for duplicates
to_ids
flagsThe to_ids
attribute will be set to False
for all attributes by default an only enabled if
the field is found to be in the new configuration parameter:
misp_to_ids_fields
: list of fields for which the to_ids
flags will be set to the MISP attributes.
Rationale: We expect it to be a deliberate decision which fields are flagged for automation by the MISP operators, so it should not be set automatically.
` misp_additional_correlation_fields': list of field for which the correlation flags will be enabled (in addition to those which are already enabled because they are significant_fields).
It is likely that there are attributes which should not be checked for duplicates of insertion,
but correlated, e.g. asn
or network
.
The feed can be interesting, so we add an option:
add_feed_provider_as_tag
: Boolean, Default True
Any idea what tag to use for adding the feed provider? https://github.com/MISP/misp-taxonomies does not seem to suggest any namespace or predicate. What about IntelMQ:feed.provider="XXX"
where XXX gets replaced with intelmq_event["feed_prodiver"]
?
new example part of runtime.conf with the new parameters:
,
"misp-api-output": {
"bot_id": "misp-api-output",
"description": "Testing the inserting of events into MSIP",
"enabled": true,
"group": "Output",
"groupname": "outputs",
"module": "intelmq.bots.outputs.misp.output_api",
"name": "MISP API Output",
"parameters": {
"add_feed_provider_as_tag": true,
"misp_additional_correlation_fields": ["source.ip", "source.asn", "source.network"],
"misp_additional_tags": ["OSINT", "osint:certainty==\"90\""],
"misp_url": "https://your-misp-instance.example.org:1234/",
"misp_key": "YOUR MISP API KEY",
"misp_tag_for_bot": "Inserted-by:IntelMQ",
"misp_to_ids_fields": ["source.fqdn", "source.reverse_dns"],
"significant_fields": ["source.fqdn", "source.reverse_dns"],
"http_verify_cert": false
},
"run_mode": "continuous"
}
Two smaller improvements with db05994455e34d214179c668b72c73b81d40b22c and 93539b8b32c859ad4a045222503bcb57884833c6 . Here is a new example config:
,
"misp-api-output": {
"bot_id": "misp-api-output",
"description": "Testing the inserting of events into MSIP",
"enabled": true,
"group": "Output",
"groupname": "outputs",
"module": "intelmq.bots.outputs.misp.output_api",
"name": "MISP API Output",
"parameters": {
"add_feed_provider_as_tag": true,
"misp_additional_correlation_fields": ["source.ip", "source.asn", "source.network"],
"misp_additional_tags": ["OSINT", "osint:certainty==\"90\""],
"misp_url": "https://your-misp-instance.example.org:1234/",
"misp_key": "YOUR MISP API KEY",
"misp_publish": false,
"misp_tag_for_bot": "Inserted-by:IntelMQ",
"misp_to_ids_fields": ["source.fqdn", "source.reverse_dns"],
"significant_fields": ["source.fqdn", "source.reverse_dns"],
"http_verify_cert": false
},
"run_mode": "continuous"
}
With 3ecf7464af0a64c90fd27698fd601582e91d35c9 and fff4db6d0a34338cb88ab8d8a8a3aaf536618779 we are ready
The bot is submitted for inclusion to upcoming IntelMQ 2.2.0 release.
Here is a brief version you can install it with the 2.1.x with the native packages (tested on Debian GNU/Linux Buster with intelmq 2.1.2-1.)
pip3 install pymisp
# get the new bot or bot version. In this case the whole subdir `outputs/misp` is missing
# so we add it to the place where the .deb package has placed the other files.
cp -r misp/ /usr/lib/python3/dist-packages/intelmq/bots/outputs/
# create the necessary setuptools script entry, by copying one of the existing ones
sed -e 's/sql\.output/misp.output_api/' /usr/bin/intelmq.bots.outputs.sql.output >/usr/bin/intelmq.bots.outputs.misp.output_api
chmod a+x /usr/bin/intelmq.bots.outputs.misp.output_api
# make the entry point known to python's setuptools
echo intelmq.bots.outputs.misp.output_api = intelmq.bots.outputs.misp.output_api:BOT.run >>/usr/lib/python3/dist-packages/intelmq-2.1.2.egg-info/entry_points.txt
# cater for missing new features in 2.1.2
cat <<EOF | patch /usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py
--- misp/output_api.py 2020-02-21 10:02:51.824693139 +0100
+++ /usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py 2020-02-21 10:43:01.988666756 +0100
@@ -58,2 +58,2 @@
-from intelmq.lib.bot import OutputBot
-from intelmq.lib.exceptions import MissingDependencyError
+from intelmq.lib.bot import Bot
+#from intelmq.lib.exceptions import MissingDependencyError
@@ -67 +67 @@
-class MISPAPIOutputBot(OutputBot):
+class MISPAPIOutputBot(Bot):
@@ -72 +72 @@
- raise MissingDependencyError('pymisp', version='>=2.4.120')
+ raise RuntimeError('Needs pymisp in version>=2.4.120.')
EOF
Also add an entry to BOTS if you want the new bot to show up in the IntelMQ Manager.
One defect: There are some events where the bot fails with something like
2020-02-22 12:43:09,049 - MISP-API-Output - ERROR - Bot has found a problem.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/intelmq/lib/bot.py", line 267, in start
self.process()
File "/usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py", line 104, in process
self._insert_misp_event(intelmq_event)
File "/usr/lib/python3/dist-packages/intelmq/bots/outputs/misp/output_api.py", line 131, in _insert_misp_event
obj = new_misp_event.add_object(name='intelmq_event')
File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 1363, in add_object
**kwargs)
File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 612, in __init__
self._set_template(kwargs.get('misp_objects_path_custom'))
File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 697, in _set_template
self._known_template = self._load_template_path(self.misp_objects_path / self.name / 'definition.json')
File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/mispevent.py", line 652, in _load_template_path
self._definition: Union[dict, None] = self._load_json(template_path)
File "/var/lib/intelmq/.local/lib/python3.6/site-packages/pymisp/abstract.py", line 50, in _load_json
data = load(f)
File "/usr/lib/python3.6/json/__init__.py", line 296, in load
return loads(fp.read(),
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1680: ordinal not in range(128)
Saw this once when a html collector bot was missconfigured. Need a way to reproduce and get an example event. Expectation: the bot should handle problematic intelmq events, if they can occurr.
A similar error message has been reported in https://github.com/MISP/PyMISP/issues/504 @Rafiot Any ideas?
It seems to depend on the contents that is given to pymisp, but not directly.
Char 1680 on ./data/misp-objects/objects/intelmq_event/definition.json is
a non-ascii double quote: “type explosion”
in the description of classification.type.
So maybe if "classification.type" is to be added, the defect in the template or template loading comes up?
Char 1680 on ./data/misp-objects/objects/intelmq_event/definition.json is a non-ascii double quote:
“type explosion”
in the description of classification.type. So maybe if "classification.type" is to be added, the defect in the template or template loading comes up?
Comes from here: https://github.com/certtools/intelmq/blob/ac1e46bb6946ac07ddc929f3691cb9ab1a1de49f/intelmq/etc/harmonization.conf#L13
I think I remember how we fixed that problem last time: https://github.com/docker-library/python/issues/13
Is the locale broken on the machine you're using?
@Rafiot thanks for the hint! Still trying to understand the full story.
An ubuntu VM where I only saw the probably temporarily and I had a faulty configuration of the http collector bot - it was missing the URL via the IntelMQ manager. Once I had fixed the configuration, I flushed redis and restarted intelmq and all went fine after this. Today I've inquired and the locale is LANG=C.UTF-8
for all three relevant users. As I don't remember doing something special about locales, it should have been this way all the time, but of course it is possible that I accidentally used LANG=C intelmqctl
. I consider it unlikely that it was just a broken locale. The case is not explained - yet.
Idea: After chatting with more engineers during lunch at intevation: Starting intelmq via the IntelMQ manager via apache may lead to a different environment (also checking via ssh). This needs to be checked next.
This is a report from a system from somebody else. The locale for the intelmq user is also LANG=C.UTF-8
. (But here also the IntelMQ manager is in use.)
To complete the story for later reference: the IntelMQ Manager uses sudo, which probably loses the LANG environment. This was consistent with both observations. On the other hand a JSON import should use UTF-8 in any case. (This is how it was done later at part of https://github.com/MISP/PyMISP/issues/504 .) An additional improvement could be to make sure that the call to "sudo intelmqctl" still uses a good locale to avoid surprises from Python.
Split out from #834 the goal here is to have a basic output bot that allows IntelMQ events to go into MISP via the MISP API.
The currently existing https://github.com/certtools/intelmq/tree/develop/intelmq/bots/outputs/misp goes via the feed input of MISP.
quoting @wagner-certat: