elceef / dnstwist

Domain name permutation engine for detecting homograph phishing attacks, typo squatting, and brand impersonation
https://dnstwist.it
Apache License 2.0
4.81k stars 764 forks source link

Dict entries with numbers not tried #99

Closed wiperpaul closed 4 years ago

wiperpaul commented 4 years ago

I am aware that no TLDs currently have numbers in them, however when using custom entries in a dictionary file containing digits they are not attempted when the file is called to make more permutations.

for example:

abc-123 abc123

elceef commented 4 years ago

All permutations are internally filtered using the following regular expression (?=^.{4,253}$)(^((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,63}\.?$) which discards TLD with numbers.

wiperpaul commented 4 years ago

What about hostnames and subdomains permuted using strings from the dictionary ?

>>> re.search('(?=^.{4,253}$)(^((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,63}\.?$)', '123test.nl')

<regex.Match object; span=(0, 10), match='123test.nl'>

The numeric removal is done using isalpha after splitting the dictionary file,

https://github.com/elceef/dnstwist/blob/f194232563cbd34f8e0d345c29ff2a6f2a992ef7/dnstwist.py#L771

Removing the line fixed the issue for me and since the tld if statement is seperate from dict that part remains restricted to numerical values only.

elceef commented 4 years ago

You're correct. I just fixed this in commit 77153b3e2bfb1c015043ef344defa47f1ac2109e.