Whitelist collectors - Githubissues

robcza commented 8 years ago

Recently encountered a problem with false positives from various source feeds. I would like to have some whitelists/exceptions in place, however I can see more approaches to do that.

The thing is, I'm not entirely sure, how to comply with IntelMQ architecture:

should this be an expert bot modifying (or even dropping) incoming events?
- the bot whitelist cache could be filled with external data through collectors + parsers
or rather do this via regular pipeline collecting and parsing and classifying those events as "whitelist"
the last possibility is to keep the whole whitelisting out of IntelMQ and do it elsewhere

Any opinions? Have you previously discussed whitelisting?

aaronkaplan commented 8 years ago

Hi!

I did not think about it yet, but IMHO this is an ideal thing for a filter: drop what you don't see as malicious event (and maybe log why you dropped it) My 2 cents,

any other opinions?

On 28 Dec 2015, at 15:02, Robert Šefr notifications@github.com wrote:

Recently encountered a problem with false positives from various source feeds. I would like to have some whitelists/exceptions in place, however I can see more approaches to do that.

The thing is, I'm not entirely sure, how to comply with IntelMQ architecture:

• should this be an expert bot modifying (or even dropping) incoming events? • the bot whitelist cache could be filled with external data through collectors + parsers • or rather do this via regular pipeline collecting and parsing and classifying those events as "whitelist" • the last possibility is to keep the whole whitelisting out of IntelMQ and do it elsewhere Any opinions? Have you previously discussed whitelisting?

— Reply to this email directly or view it on GitHub.

swannysec commented 8 years ago

The capacity to take the Alexa Top N and filter those out would be excellent!

aaronkaplan commented 8 years ago

On Tue, Mar 15, 2016 at 12:50:54PM -0700, John D. Swanson wrote:

The capacity to take the Alexa Top N and filter those out would be excellent!

shouldnt' be too hard :)

You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/certtools/intelmq/issues/426#issuecomment-196994500

Physicist are there to find the laws of nature. Engineers are there to work around them

robcza commented 8 years ago

I'm thinking of doing this bit more sophisticated. At the moment, we assign feed.accuracy, this is inherited to every single event. Let's collect whitelists with accuracy as well and subtract to get resulting accuracy of the event.

reputation = blacklist(feed.accuracy) - whitelist(feed.accuracy)

What do you think?

aaronkaplan commented 8 years ago

Mobile

On 16.03.2016, at 19:05, Robert Šefr notifications@github.com wrote:

I'm thinking of doing this bit more sophisticated. At the moment, we assign feed.accuracy, this is inherited to every single event. Let's collect whitelists with accuracy as well and subtract to get resulting accuracy of the event.

reputation = blacklist(feed.accuracy) - whitelist(feed.accuracy)

Hard to measure. But I have hardly any better idea.

What do you think?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

aaronkaplan commented 8 years ago

Folks, I thought about it some more.

What is the use-case?

Whitelisting is useful for CERTs so that they don't send out unnecessary alerts to whitelisted sites/IPs which

either are a sandbox, researcher, or some one else doing things for a legitimate reason
are the result of some malware sandbox execution and the malware simply checks for internet connectivity ("can I ping www.google.com?")

This means, we need to be able to filter out events based on IP addresses and/or urls or hostnames (fqdn). This part is easy. We can use a filter bot.

When does this fail?

There might be a situation, where something whitelisted is actually infected/affected and you do want to report this. Usually, in this case you want to have some proof, that the claim (for example www.yahoo.com is defaced) is really really true, before you send this out. So, it is related to accuracy, as @robcza mentioned.

@swannysec would it be sufficient if a whitelist bot simply "marks" an event as "hey, this is on a whitelist, do not send it out or inspect manually" by adding a special key:value to an event like this:

  event.add ("whitelisted", true)
  event.send()

?

swannysec commented 8 years ago

Aaron,

Having the option to either note or actually exclude from output would be super useful. Whether I'm right to apply it in this manner or not, IntelMQ allows me to do some really useful processing of feeds without a lot of fuss. If I can do legitimate whitelisting (dedup is already in) and output "clean" feeds, I can more confidently alert/enforce against them.

For me, whitelisting is not something that is expected to work 100% of the time; it's about reducing risk to business operations and alert fatigue, so I'll take what I can get.

Sorry for the delayed reply! John

On Wed, May 11, 2016 at 6:07 PM, AaronK notifications@github.com wrote:

Folks, I thought about it some more. What is the use-case?

Whitelisting is useful for CERTs so that they don't send out unnecessary alerts to whitelisted sites/IPs which

either are a sandbox, researcher, or some one else doing things for a legitimate reason

are the result of some malware sandbox execution and the malware simply checks for internet connectivity ("can I ping www.google.com?")

This means, we need to be able to filter out events based on IP addresses and/or urls or hostnames (fqdn). This part is easy. We can use a filter bot. When does this fail?

There might be a situation, where something whitelisted is actually infected/affected and you do want to report this. Usually, in this case you want to have some proof, that the claim (for example www.yahoo.com is defaced) is really really true, before you send this out. So, it is related to accuracy, as @robcza https://github.com/robcza mentioned.

@swannysec https://github.com/swannysec would it be sufficient if a whitelist bot simply "marks" an event as "hey, this is on a whitelist, do not send it out or inspect manually" by adding a special key:value to an event like this:

event.add ("whitelisted", true) event.send()

?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/certtools/intelmq/issues/426#issuecomment-218605065

robcza commented 8 years ago

@aaronkaplan I understand your approach and I agree, having the information, some event is present on the whitelist is useful. However, I'd like to include my approach as well. I think both approaches could co-exist. Some of the users will rely on the simple black/white list flag, some can take the accuracy into account.

certtools / intelmq

Whitelist collectors #426

What is the use-case?

When does this fail?