elceef / dnstwist

Domain name permutation engine for detecting homograph phishing attacks, typo squatting, and brand impersonation
https://dnstwist.it
Apache License 2.0
4.85k stars 767 forks source link

Skip domains when running into errors #190

Closed wurzelmuschel closed 1 year ago

wurzelmuschel commented 1 year ago

Hi,

I am using dnstwist as a python library in my application. For some domains that dnstwist generates, it runs into errors when it tries to collect additional information about a discovered domain (e.g. timeout errors or socket errors). When this happens, dnstwist (or more specifically libraries like url lib) throws an exception that is not handled internally by dnstwist. I can catch and handle it myself, but the current run is stopped and the results that were collected until then are gone. For certain domains it may take north of 40mins for a run, so rust restarting it is not the best option (especially when it dies again).

Would it be possible to handle these events internally and, skip the "faulty" domain that caused the error and continue with the next one?

elceef commented 1 year ago

Could you please share example traceback? Which version do you use?

wurzelmuschel commented 1 year ago

Below you will find a traceback of a recent crash. This is from a system that uses 20230509 of dnstwist.

I call dnstwist.run() as follows, which also includes the domain name that caused the error below:

fakes = dnstwist.run(all=True, banners=True, format='null', mxcheck=True, domain='icig-bs.de', registered=True, phash=True, lsh='tlsh', whois=True)

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/local/lib/python3.11/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.11/http/client.py", line 942, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/socket.py", line 827, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 8] Name does not resolve

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 250, in <module>
    find_fake_domains(db)
  File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 83, in find_fake_domains
    fakes = dnstwist.run(all=True,
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 945, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 1200, in run
    r = UrlOpener(request_url,
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 204, in __init__
    with urllib.request.urlopen(request, timeout=timeout, context=ctx) as r:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 1377, in http_open
    return self.do_open(http.client.HTTPConnection, req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 8] Name does not resolve>

Process finished with exit code 1
elceef commented 1 year ago

This is intentional. You have chosen the options phash=True and lsh='tlsh', which necessitate querying an HTTP server located behind the initial domain. If, for any reason, this process fails (such as in this case where the domain name cannot be resolved), an exception will be raised. This behavior mirrors that of the command line. Likewise, if you provide an invalid domain name, dnstwist.run() will also raise an exception.

wurzelmuschel commented 1 year ago

I don't think is has to do with the domain that is being given as the argument to the "run" function, but it seems to happen if one of the domains dnstwist creates is being checked. The problem is hardly reproducible. If I run the same script several times (even with the domain that does not have a website), it only sometimes drops out with the traceback I sent earlier, probably if it tries to check a domain it created that does not resolve (for whatever reason). If it would have to do with the "original" domain, the error should happen every time, right?

Coming back to my original question (whether dnstwist can handle errors internally by skipping a problematic domain), I just came across another exception that is not being handled internally. Again, it would be great if dnstwist would handle this by just ignoring/skipping the problematic domain:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/local/lib/python3.11/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.11/http/client.py", line 1448, in connect
    super().connect()
  File "/usr/local/lib/python3.11/http/client.py", line 942, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/socket.py", line 851, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.11/socket.py", line 836, in create_connection
    sock.connect(sa)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 254, in <module>
    find_fake_domains(db)
  File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 82, in find_fake_domains
    fakes = dnstwist.run(all=True,
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 945, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 1200, in run
    r = UrlOpener(request_url,
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 204, in __init__
    with urllib.request.urlopen(request, timeout=timeout, context=ctx) as r:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error timed out>

Process finished with exit code 1
elceef commented 1 year ago

It's still the same cause - can't query HTTP server behind the initial domain, but this time due to timeout.

File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 1200, in run
    r = UrlOpener(request_url,

I could consider throwing custom exceptions, but still you would need to handle them.