not applying 'whitelist' to CIF requests

sfinlon commented 7 years ago

When performing a CIF query, minemeld appears to just be pulling everything with an otype and confidence. https://github.com/PaloAltoNetworks/minemeld-core/blob/master/minemeld/ft/cif.py#L104

Minemeld should utilize the --feed option in the CIFSDK which incorporates a whitelist on the CIF server before sending the results. https://github.com/csirtgadgets/cif-sdk-py/blob/master/cifsdk/client.py#L455

We know in CIFv2 it's not perfect, but right now we've heard complaints from a handful of customers about a lot of items that should be filtered as whitelisted are making it through and being placed in block lists on their firewalls.

We've done some abstracting and improved the whitelisting in CIFv3. Another suggestion would be to try to replicate what we did in client.py for cif.py https://github.com/csirtgadgets/bearded-avenger-sdk-py/blob/master/cifsdk/client/client.py#L222 and build a cifv3.py client with #222 for when we start rolling out v3.

jtschichold commented 7 years ago

Hi @sfinlon, today MineMeld users can define Miners to be used as static and/or dynamic whitelists. I think this should cover this use case. They could create a second CIF Miner to extract indicators marked as whitelists and use them to whitelists indicators from CIF itself or other feeds.

Luigi

sfinlon commented 7 years ago

The whitelisting that CIF does when --feed is specified isn't requested by the user, it just happens, and the logic can't be replicated easily as part of the static/dynamic list.

We expect that anyone should be able to enter indicators into CIF but don't necessarily want them to be passed without validation. So we accept anything, and then rely on the whitelisting capabilities in CIF itself to make sure it's not putting indicators in to the feed that shouldn't be. This seems to be most common when infrastrure (AWS for example) ends up hosting a phishing site, that site might get entered in to CIF as 'phishing' and confidence of '85'. If the whitelisting isn't happening, AWS will be put in to the feed and then placed in the block list. This example is what has happened at at least 3 different sites who use Palo Alto's and CIF.

I agree that having user lists is great for instances where an entity wants to get a false positive whitelisted, but as it is far too many urls are slipping by and being blocked that shouldn't be, and we shouldn't rely on the end users to have to maintain the Alexa top 10,000 domains in their own whitelist when this happens when --feed is used.

jtschichold commented 7 years ago

Hi @sfinlon, what kind of matching is performed against whitelisted indicators ? IPv4 aggregator today can do partial matching, while URL, domain, ... aggregator can do exact match.

If that is fine, on MineMeld you can:

create a cif Miner with a name starting with "wl" based on prototype cif.Feed. This will be used as whitelist
in the whitelist Miner as filter select whitelist tag and confidence above 25
create a standard CIF Miner
connect both Miners to the same aggregator

The aggregator will automatically whitelist indicators coming from the wl Miner.

luigi

sfinlon commented 7 years ago

What you're suggesting is basically what is already happening on the CIF server here: https://github.com/csirtgadgets/cif-sdk-py/blob/master/cifsdk/client.py#L455 We take a combination of user generated and curated lists, combined with the current Alexa top 10,000 domains and use that to generate a dynamic list to reduce the chance of blocking any of the top domains. This has worked extremely well for us, and is where we are running in to problems. The logic in the link above isn't replicated and domains are being blocked that should not be.

That all said, I tried to create 4 separate miners, one for each otype, and name them something to the effect of wl-ipv4 wl-ipv6 wl-fqdn wl-url and pull 4 separate whitelists to try and see if we could make this work.

My questions are now , how will minemeld handle differences in types/whitelists?

For example, aws.amazon.com/badphishingurl is entered in to CIF as a url, part of how CIF operates is it will break the url apart and enter the full URL and also the domain of aws.amazon.com as separate entries and confidences. However, it appears in my testing, that since this is URL but the FQDN is whitelisted, mindmeld doesn't appear to handle the whitelist as expected and still places the aws.amazon.com in the block list anyways.

jtschichold commented 7 years ago

Hi @sfinlon, the match between whitelist entries and the indicators depends on the aggregator, but currently only the IPv4 aggregator implements an advanced matching logic. Please could you point me to the code inside the CIF SDK where the matching between the whitelist URLs and the indicators is performed ? I will work on reimplementing it inside the CIF miner.

Thanks, luigi

erush6861 commented 5 years ago

I will work on reimplementing it inside the CIF miner.

Has anything been done for this?

I'd prefer if the --feed option was used, making it unnecessary to try and aggregate cif whitelists within minemeld. (I keep running into stopped prototypes when I start adding more of them. I realize this is a different issue.)

PaloAltoNetworks / minemeld

not applying 'whitelist' to CIF requests #7