InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
498 stars 91 forks source link

Avoid parsing CIDR notation IPs #82

Closed tlansec closed 3 weeks ago

tlansec commented 3 weeks ago

Currently the library will extract IP addresses that are part of a CIDR notation string e.g. given the following text:

The attackers used the netblock "10.20.30.0/24"

The existing regex will parse out the "10.20.30.0" IP as an IP address.

tlansec commented 3 weeks ago

Shortly after my initial commit I realized my regex change will cause it to not match any IP suffixed by a number, e.g could exclude:

127.0.0.1/1

I've made a change to my initial regex change which is better but will still prevent IP addresses which are suffixed by short numbers of integers from being parsed.

An alternative approach that might be better is a flag which users can set which enables them to knowingly choose to exclude IP addresses that might be part of a CIDR string.

tlansec commented 3 weeks ago

With the current state of the change:

from iocextract import extract_ips

data = """
127.0.0.1/24
https://127.0.0.2/123 
13.44.55.67/161 
13.44.55.68  
13.44.55.69/456 
13.44.55.70/1
"""

extracted = extract_ips(data)
for e in extracted:
    print(e)

yields:

127.0.0.2
13.44.55.67
13.44.55.68
13.44.55.69
pedramamini commented 3 weeks ago

Thank you for your efforts!