Closed colin-stubbs closed 1 month ago
Thanks for your work here.
Does this actually result in a bug in bbot? I'm aware these regexes aren't perfect but they shouldn't be being used for validation; only for event type detection. The actual validation happens later via ipaddress
and urllib
.
These regexes were designed for speed and simplicity. In case where a full rfc-compliant regex is required, I'd much rather offload it to an official library (others have already written better validation):
try:
ipaddress.ip_address(data)
# it's an ip
except ValueError:
# it's a DNS name
So we can avoid situations this:
No existing module currently uses ipv6_regex directly. It just seems to get used as part of open port regexes and url regexes so perhaps indirectly it does.
I've started using it directly though, as I'm also interested in detecting as much IP addressing related to targets as possible, in particular in situations in which IP's are used directly instead of DNS names which while uncommon do occur particularly within internal networks.
I totally agree you'll want to avoid having to manage/maintain regex patterns and offloading it to a central library that's going to do a better job if it would be ideal.
That said... making patterns available for modules to use via bbot/core/helpers/regexes.py
seems to be the current approach to providing a simple and reliable interface to do that?
~/bbot$ grep -C 4 _regex\. bbot/core/helpers/dns.py
results.add((rdtype, self._clean_dns_record(record.target)))
elif rdtype == "TXT":
for s in record.strings:
s = self.parent_helper.smart_decode(s)
for match in dns_name_regex.finditer(s):
start, end = match.span()
host = s[start:end]
results.add((rdtype, host))
elif rdtype == "NSEC":
~/bbot$
~/bbot$ grep -E '_regex\.(match|find)' bbot/modules/*.py
bbot/modules/azure_tenant.py: matches = self.helpers.regexes.uuid_regex.findall(authorization_endpoint)
bbot/modules/azure_tenant.py: found_domains = list(set(self.d_xml_regex.findall(r.text)))
bbot/modules/digitorus.py: for match in extract_regex.finditer(content):
bbot/modules/git.py: if getattr(result, "status_code", 0) == 200 and "[core]" in text and not self.fp_regex.match(text):
bbot/modules/httpx.py: if tempdir.is_dir() and self.httpx_tempdir_regex.match(tempdir.name):
bbot/modules/__init__.py: if e.is_dir() and dir_regex.match(e.name) and not e.name == "modules":
bbot/modules/massdns.py: digits = self.digit_regex.findall(d)
bbot/modules/rapiddns.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/riddler.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
bbot/modules/sslcert.py: if issuer.emailAddress and self.helpers.regexes.email_regex.match(issuer.emailAddress):
bbot/modules/sslcert.py: if subject.emailAddress and self.helpers.regexes.email_regex.match(subject.emailAddress):
bbot/modules/viewdns.py: if self.date_regex.match(table_cells[1].text.strip()):
bbot/modules/virustotal.py: for match in self.helpers.regexes.dns_name_regex.findall(text):
~/bbot$
ipaddress only used by ipneighbour,
~/bbot$ grep ipaddress bbot/modules/*.py
bbot/modules/ipneighbor.py:import ipaddress
bbot/modules/ipneighbor.py: network = ipaddress.ip_network(f"{main_ip}/{netmask}", strict=False)
~/bbot$
None of them seem to use get_event_type()
as the test modules do though perhaps that's the best validation process after any form of extraction?
~/bbot$ grep get_event_type bbot/modules/*.py
~/bbot$
Ah okay, I'm starting to see your use case. Are you wanting to extract IP addresses from HTTP responses, etc.?
I should mention we have lots of helpers for converting to IP addresses/networks, parsing, validation, etc. that don't require you to import anything. From inside a module, these are available under self.helpers
.
https://github.com/blacklanternsecurity/bbot/pull/1399 has been merged into dev.
Describe the bug
_ipv6_regex/ipv6_regex
frombbot/core/helpers/regexes.py
was used to try to match IPv6 addresses, however it incorrectly matched strings such as 'a:b:c', '0:1:d' and MAC addresses such as '9e:3e:53:29:43:64'.A more complex IPv6 regex matcher is required, e.g.
bbot/test/test_step_1/test_regexes.py
is not performing any testing of this pattern, hence this has not been picked up previously.Additionally no IPv4 dedicated pattern matcher exists, and dns_name_regex is relied upon. Addition of a dedicated IPv4 regex pattern matcher for use by modules is needed, along with testing of that pattern.
Proposed pattern,
Expected behavior
_ipv6_regex/ipv6_regex
should reliably detect IPv6 addresses and ONLY IPv6 addresses.An IPv4 regex pattern matcher is expected to be available.
BBOT Command
Anything that involves targets that will involve IPv6 or IPv6-like strings as part of discovery.
OS, BBOT Installation Method + Version
OS: Debian "Bookworm" Installation method: poetry Python: platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0 BBOT Version: git/stable
BBOT Config
Not applicable. Simply using
./bbot/test/run_tests.sh
.Logs
Not applicable.
Screenshots
Current IPv6 pattern,
Proposed IPv6 pattern,
Proposed IPv4 pattern,