Better IP aggregation - Githubissues

tfriesen commented 7 years ago

Either the output nodes or the aggregators need to do better aggregation. For example, the following IP indicators show up in a standard inbound/high confidence output:

120.128.128.0-120.128.191.255 120.128.192.0-120.128.255.255 120.129.0.0-120.129.127.255 120.129.128.0-120.129.255.255 120.130.0.0-120.130.127.255 120.130.128.0-120.130.191.255 120.130.192.0-120.130.255.255

These 7 ranges are fairly obviously contiguous, and should be reduced to a single range: 120.128.128.0-120.130.255.255

There are many, many examples of this in the output.

The impetus is that, with a lot of miners, feeds can grow very large, leading to problems importing the EDLs on PAN firewalls, especially ones with a lot of vsyses. Better aggregation would allow us to make use of more miners, and hence more indicators, before hitting those limits.

jtschichold commented 7 years ago

Aggregators are in charge of aggregating metadata/attributes between multiple overlapping indicators. In this case we could optimize the number of indicators generated in the EDL format by summarizing the IP ranges.

It's a great idea and it should be implemented in the feed output node/feed API.

Plan:

implement a new output format in minemeld.flask.feedredis for panosedl
when the output is enabled if multiple contiguous ranges are present they should be summarized in a single IP range

tfriesen commented 7 years ago

Not to nag, but has there been any headway on this issue?

jtschichold commented 7 years ago

@tfriesen after the EDL scalability improvements introduced in the release of PAN-OS 7.1 and 8.0 we haven't received much requests about aggregating IP ranges and we lowered the priority of this issue. If you are still facing issues I will add this to the plan for release 0.9.42.

Thanks!

tfriesen commented 7 years ago

We are. The problem is that EDLs are multiplicative with the vsyses they're used on. So with something like 30k entries in an EDL that is shared across 30+ vsyses, we're over 900k IP address objects, more than our 7050 can handle, even with the EDL scalability improvements (or so it was explained to me). We have more sources of threat intel we'd like to leverage in mindmeld, but have kind of hit a wall because of this.

Thanks.

jtschichold commented 6 years ago

Started working on this

KellyMurphy commented 6 years ago

I was able to reduce the Alienvault ip list from 62k lines to 55k lines by doing the following.

1) read all IP addresses 1) Convert IP Address to 32 bit value 2) Append 32bit value to list 2) sort/distinct list 3) setup loop 1) set initial start and end ip address to first list entry 2) loop through list 1) if current ip = end + 1 1) set new end = current ip 2) else 1) output start and end range converting 32 bit value back to dotted notation. 2) set start and end to current ip.

I'm not a python developer, I wrote a small C# app to process the list output from minemeld. I can upload my c# prog to a git repo for reference if that would help.

infinite-turtles commented 6 years ago

The standard library method ipaddress.collapse_addresses() will do what you want. This example gets the same compression as @KellyMurphy on the AlientVault list:

#!/usr/bin/env python3
"""Takes lists of IPs, one IP per line, sorts and deduplicates. Continuous ranges of IP addresses are converted into
equivalent subnets.
Usage: compress_ip_list.py FILES..."""
import fileinput
import ipaddress

def read_ips():
    for ip in fileinput.input():
        if ip:
            ip = ip.rstrip()  # Remove line ending
            try:
                ip = ipaddress.IPv4Network(ip)
                yield ip
            except ValueError:
                continue

def main():
    collapsed_ip_ranges = ipaddress.collapse_addresses(read_ips())
    for subnet in collapsed_ip_ranges:
        print(subnet)

if __name__ == '__main__':
    main()

inertia64 commented 4 years ago

Was this implemented in 0.9.68? Thanks

snatch2013 commented 3 years ago

Hello dear developers,

any news about this?

snatch2013 commented 3 years ago

for example, if I create an ipv4 list for azure cloud subnets, it is more than 66298 entries. But if run ipaddress.collapse_addresses() function on it, it is just 1817. So it looks like quite easy fix. And taking into account that there is a limitation of 50 000 total entries for all configured EDL, not clear how to use minemeld with paloalto, except it is PA-7000.

PaloAltoNetworks / minemeld-core

Better IP aggregation #109