certtools / intelmq

IntelMQ is a solution for IT security teams for collecting and processing security feeds using a message queuing protocol.
https://docs.intelmq.org/latest/
GNU Affero General Public License v3.0
948 stars 296 forks source link

intelmq.lib.exceptions.KeyExists #2462

Closed vandaref closed 4 months ago

vandaref commented 4 months ago

I'm trying to create a custom parser bot for Crowdstrike data collected.

I'm facing the following issue : intelmq.lib.exceptions.KeyExists: key 'malware.hash.sha256' already exists and it could happens with others fields than malware.hash.sha256.

I suppose there are duplicate hash (or other fields) in the feed I'm collecting. This is the output of my debug :

rowdstrike-parser: Start loop
crowdstrike-parser: 91c03c1acf3d1dfee8aa458d08cc51b5ec7c8708
crowdstrike-parser: End loop
crowdstrike-parser: Start loop
crowdstrike-parser: End loop
crowdstrike-parser: Start loop
crowdstrike-parser: 7c3f83d9ebc4adcc2e76813de65450deffc225633806a071036cb36bb3afaa60
crowdstrike-parser: End loop
crowdstrike-parser: Start loop
crowdstrike-parser: c840db85a0f4a469f471d494de86828f24a1e841fed3719b52e1f6d13e5f4616
crowdstrike-parser: Failed to parse line. 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/intelmq/lib/bot.py", line 1225, in process
    events: list[libmessage.Event] = list(filter(bool, value))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/intelmq/bots/parsers/crowdstrike/parser.py", line 36, in parse_line
    event.add("malware.hash.sha256", ioc["indicator"], raise_failure=False)
  File "/usr/lib/python3/dist-packages/intelmq/lib/message.py", line 237, in add
    raise exceptions.KeyExists(key)
intelmq.lib.exceptions.KeyExists: key 'malware.hash.sha256' already exists
sebix commented 4 months ago

You cannot set a key if they key already exists without explicitly overwriting it. Use event.add(key, value, overwrite=True) instead. https://intelmq.readthedocs.io/en/develop/source/intelmq.lib.html#intelmq.lib.message.Message.add

vandaref commented 4 months ago

Will it erase each new value and at the end I'll have only the last one ?

sebix commented 4 months ago

Yes, each key can only have one value. It's a dictionary. https://docs.intelmq.org/latest/user/event/#fields-reference lists all fields and their type.

If you need to write lists, you can use a custom field in the extra. namespace. They can have any type.

vandaref commented 4 months ago

But each event is different. I don't have this issue with other bot. The hash of each document will be different. So the key already exists with an other value.
Please see this example :

{
  "id": "hash_md5_58a5bdcf325429d36194202544359f22",
  "indicator": "58a5bdcf325429d36194202544359f22",
  "type": "hash_md5",
  "deleted": false,
  "published_date": 1364395570,
  "last_updated": 1707145213,
  ...,
  ],
  "vulnerabilities": []
},
{
  "id": "hash_md5_ad7eacf53192afdce79b951ba860d3d3",
  "indicator": "ad7eacf53192afdce79b951ba860d3d3",
  "type": "hash_md5",
  "deleted": false,
  "published_date": 1378907777,
  "last_updated": 1707145213,
  ..,
  "vulnerabilities": []
}

I implement indicator value as malware.hash.XXX key. I'm not sure to understand because on other bot this is the same schema we have different value for one key.

sebix commented 4 months ago

But each event is different. I don't have this issue with other bot. The hash of each document will be different.

And each "document" will become its own event? If every document has its own hash, there will be no conflicts.

I implement indicator value as malware.hash.XXX key. I'm not sure to understand because on other bot this is the same schema we have different value for one key.

We don't know your bot code.