Open tfriesen opened 7 years ago
Aggregators are in charge of aggregating metadata/attributes between multiple overlapping indicators. In this case we could optimize the number of indicators generated in the EDL format by summarizing the IP ranges.
It's a great idea and it should be implemented in the feed output node/feed API.
Plan:
Not to nag, but has there been any headway on this issue?
@tfriesen after the EDL scalability improvements introduced in the release of PAN-OS 7.1 and 8.0 we haven't received much requests about aggregating IP ranges and we lowered the priority of this issue. If you are still facing issues I will add this to the plan for release 0.9.42.
Thanks!
We are. The problem is that EDLs are multiplicative with the vsyses they're used on. So with something like 30k entries in an EDL that is shared across 30+ vsyses, we're over 900k IP address objects, more than our 7050 can handle, even with the EDL scalability improvements (or so it was explained to me). We have more sources of threat intel we'd like to leverage in mindmeld, but have kind of hit a wall because of this.
Thanks.
Started working on this
I was able to reduce the Alienvault ip list from 62k lines to 55k lines by doing the following.
1) read all IP addresses 1) Convert IP Address to 32 bit value 2) Append 32bit value to list 2) sort/distinct list 3) setup loop 1) set initial start and end ip address to first list entry 2) loop through list 1) if current ip = end + 1 1) set new end = current ip 2) else 1) output start and end range converting 32 bit value back to dotted notation. 2) set start and end to current ip.
I'm not a python developer, I wrote a small C# app to process the list output from minemeld. I can upload my c# prog to a git repo for reference if that would help.
The standard library method ipaddress.collapse_addresses() will do what you want. This example gets the same compression as @KellyMurphy on the AlientVault list:
#!/usr/bin/env python3
"""Takes lists of IPs, one IP per line, sorts and deduplicates. Continuous ranges of IP addresses are converted into
equivalent subnets.
Usage: compress_ip_list.py FILES..."""
import fileinput
import ipaddress
def read_ips():
for ip in fileinput.input():
if ip:
ip = ip.rstrip() # Remove line ending
try:
ip = ipaddress.IPv4Network(ip)
yield ip
except ValueError:
continue
def main():
collapsed_ip_ranges = ipaddress.collapse_addresses(read_ips())
for subnet in collapsed_ip_ranges:
print(subnet)
if __name__ == '__main__':
main()
Was this implemented in 0.9.68? Thanks
Hello dear developers,
any news about this?
for example, if I create an ipv4 list for azure cloud subnets, it is more than 66298 entries. But if run ipaddress.collapse_addresses() function on it, it is just 1817. So it looks like quite easy fix. And taking into account that there is a limitation of 50 000 total entries for all configured EDL, not clear how to use minemeld with paloalto, except it is PA-7000.
Either the output nodes or the aggregators need to do better aggregation. For example, the following IP indicators show up in a standard inbound/high confidence output:
120.128.128.0-120.128.191.255 120.128.192.0-120.128.255.255 120.129.0.0-120.129.127.255 120.129.128.0-120.129.255.255 120.130.0.0-120.130.127.255 120.130.128.0-120.130.191.255 120.130.192.0-120.130.255.255
These 7 ranges are fairly obviously contiguous, and should be reduced to a single range: 120.128.128.0-120.130.255.255
There are many, many examples of this in the output.
The impetus is that, with a lot of miners, feeds can grow very large, leading to problems importing the EDLs on PAN firewalls, especially ones with a lot of vsyses. Better aggregation would allow us to make use of more miners, and hence more indicators, before hitting those limits.