blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.03k stars 370 forks source link

IPv6 regex pattern incorrectly matches non-IPv6 addresses, no testing is being done for IP related regex patterns #1397

Closed colin-stubbs closed 1 month ago

colin-stubbs commented 1 month ago

Describe the bug

_ipv6_regex/ipv6_regex from bbot/core/helpers/regexes.py was used to try to match IPv6 addresses, however it incorrectly matched strings such as 'a:b:c', '0:1:d' and MAC addresses such as '9e:3e:53:29:43:64'.

A more complex IPv6 regex matcher is required, e.g.

# IPv6 is complicated, so we have to accomodate multiple alternative pattern types,
# :(:[A-F0-9]{1,4}){1,7} == ::1, ::ffff:1
# ([A-F0-9]{1,4}:){1,7}: == 2001::, 2001:db8::, 2001:db8:0:1:2:3::
# ([A-F0-9]{1,4}:){1,6}:([A-F0-9]{1,4}) == 2001::1, 2001:db8::1, 2001:db8:0:1:2:3::1
# ([A-F0-9]{1,4}:){7,7}([A-F0-9]{1,4}) == 1:1:1:1:1:1:1:1, ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff

_ipv6_regex = r"(:(:[A-F0-9]{1,4}){1,7}|([A-F0-9]{1,4}:){1,7}:|([A-F0-9]{1,4}:){1,6}:([A-F0-9]{1,4})|([A-F0-9]{1,4}:){7,7}([A-F0-9]{1,4}))"
ipv6_regex = re.compile(_ipv6_regex, re.I)

bbot/test/test_step_1/test_regexes.py is not performing any testing of this pattern, hence this has not been picked up previously.

Additionally no IPv4 dedicated pattern matcher exists, and dns_name_regex is relied upon. Addition of a dedicated IPv4 regex pattern matcher for use by modules is needed, along with testing of that pattern.

Proposed pattern,

_ipv4_regex = r"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}"
ipv4_regex = re.compile(_ipv4_regex, re.I)

Expected behavior

_ipv6_regex/ipv6_regex should reliably detect IPv6 addresses and ONLY IPv6 addresses.

An IPv4 regex pattern matcher is expected to be available.

BBOT Command

Anything that involves targets that will involve IPv6 or IPv6-like strings as part of discovery.

OS, BBOT Installation Method + Version

OS: Debian "Bookworm" Installation method: poetry Python: platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0 BBOT Version: git/stable

BBOT Config

Not applicable. Simply using ./bbot/test/run_tests.sh.

Logs

Not applicable.

Screenshots

Current IPv6 pattern,

Screenshot 2024-05-23 at 9 59 10 AM

Proposed IPv6 pattern,

Screenshot 2024-05-23 at 9 58 44 AM

Proposed IPv4 pattern,

Screenshot 2024-05-23 at 10 53 59 AM

TheTechromancer commented 1 month ago

Thanks for your work here.

Does this actually result in a bug in bbot? I'm aware these regexes aren't perfect but they shouldn't be being used for validation; only for event type detection. The actual validation happens later via ipaddress and urllib.

These regexes were designed for speed and simplicity. In case where a full rfc-compliant regex is required, I'd much rather offload it to an official library (others have already written better validation):

try:
    ipaddress.ip_address(data)
    # it's an ip
except ValueError:
    # it's a DNS name

So we can avoid situations this: Screenshot_20240522-212054.png

colin-stubbs commented 1 month ago

No existing module currently uses ipv6_regex directly. It just seems to get used as part of open port regexes and url regexes so perhaps indirectly it does.

I've started using it directly though, as I'm also interested in detecting as much IP addressing related to targets as possible, in particular in situations in which IP's are used directly instead of DNS names which while uncommon do occur particularly within internal networks.

I totally agree you'll want to avoid having to manage/maintain regex patterns and offloading it to a central library that's going to do a better job if it would be ideal.

That said... making patterns available for modules to use via bbot/core/helpers/regexes.py seems to be the current approach to providing a simple and reliable interface to do that?

~/bbot$ grep -C 4 _regex\. bbot/core/helpers/dns.py
            results.add((rdtype, self._clean_dns_record(record.target)))
        elif rdtype == "TXT":
            for s in record.strings:
                s = self.parent_helper.smart_decode(s)
                for match in dns_name_regex.finditer(s):
                    start, end = match.span()
                    host = s[start:end]
                    results.add((rdtype, host))
        elif rdtype == "NSEC":
~/bbot$ 
~/bbot$ grep -E '_regex\.(match|find)' bbot/modules/*.py
bbot/modules/azure_tenant.py:        matches = self.helpers.regexes.uuid_regex.findall(authorization_endpoint)
bbot/modules/azure_tenant.py:        found_domains = list(set(self.d_xml_regex.findall(r.text)))
bbot/modules/digitorus.py:            for match in extract_regex.finditer(content):
bbot/modules/git.py:                if getattr(result, "status_code", 0) == 200 and "[core]" in text and not self.fp_regex.match(text):
bbot/modules/httpx.py:            if tempdir.is_dir() and self.httpx_tempdir_regex.match(tempdir.name):
bbot/modules/__init__.py:    if e.is_dir() and dir_regex.match(e.name) and not e.name == "modules":
bbot/modules/massdns.py:        digits = self.digit_regex.findall(d)
bbot/modules/rapiddns.py:        for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/riddler.py:        for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/sslcert.py:            if issuer.emailAddress and self.helpers.regexes.email_regex.match(issuer.emailAddress):
bbot/modules/sslcert.py:            if subject.emailAddress and self.helpers.regexes.email_regex.match(subject.emailAddress):
bbot/modules/viewdns.py:                if self.date_regex.match(table_cells[1].text.strip()):
bbot/modules/virustotal.py:        for match in self.helpers.regexes.dns_name_regex.findall(text):
~/bbot$ 

ipaddress only used by ipneighbour,

~/bbot$ grep ipaddress bbot/modules/*.py
bbot/modules/ipneighbor.py:import ipaddress
bbot/modules/ipneighbor.py:        network = ipaddress.ip_network(f"{main_ip}/{netmask}", strict=False)
~/bbot$ 

None of them seem to use get_event_type() as the test modules do though perhaps that's the best validation process after any form of extraction?

~/bbot$ grep get_event_type bbot/modules/*.py
~/bbot$ 
TheTechromancer commented 1 month ago

Ah okay, I'm starting to see your use case. Are you wanting to extract IP addresses from HTTP responses, etc.?

TheTechromancer commented 1 month ago

I should mention we have lots of helpers for converting to IP addresses/networks, parsing, validation, etc. that don't require you to import anything. From inside a module, these are available under self.helpers.

TheTechromancer commented 1 month ago

https://github.com/blacklanternsecurity/bbot/pull/1399 has been merged into dev.