elceef / dnstwist

Domain name permutation engine for detecting homograph phishing attacks, typo squatting, and brand impersonation
https://dnstwist.it
Apache License 2.0
4.9k stars 773 forks source link

Cyrillic Fuzzer Issues: No Permutations for 'z'-Containing Domains and Limited Variations for Other Domains #213

Closed LionGose closed 9 months ago

LionGose commented 9 months ago

The Cyrillic fuzzer is not working with domains containing the letter 'z'. For example, when running the command

python dnstwist.py --fuzzers "cyrillic" testz.com

I encounter the following issue:

usage: dnstwist.py [OPTION]... DOMAIN
dnstwist.py: error: selected fuzzing algorithms do not generate any permutations for the provided input domain

However, when I use the domain 'test.com', I only get one option instead of several. Here's the output:

permutations: 100.00% of 1 | found: 0 | eta: 0m 00s | speed: 6 qps
cyrillic  теѕт.com  -

I believe modifying the _cyrillic function in this way could be a good solution:

    def _cyrillic(self, current_domain='', index=0):
            if index == len(self.domain):
                return [current_domain]

            char = self.domain[index]

            if char in self.latin_to_cyrillic:
                cyrillic_chars = self.latin_to_cyrillic[char]
            else:
                cyrillic_chars = [char]

            variants = []
            for cyrillic_char in cyrillic_chars:
                new_domain = current_domain + cyrillic_char
                variants.extend(self._cyrillic(new_domain, index + 1))

            new_domain = current_domain + char
            variants.extend(self._cyrillic(new_domain, index + 1))

            return variants

Additionally, it might be worthwhile to consider a combination of all available variant generation methods (such as cyrillic, homoglyph, hyphenation, etc.) to ensure a more comprehensive and effective fuzzing process.

elceef commented 9 months ago

A mix of characters from two or more Unicode scripts (for example Latin and Cyrillic) will result in domain permutation which in practice can't be registered. I mention about this in the documentation. It often happens that such a domain is accepted by the registrar and only rejected a moment later.