elceef / dnstwist

Domain name permutation engine for detecting homograph phishing attacks, typo squatting, and brand impersonation
https://dnstwist.it
Apache License 2.0
4.76k stars 761 forks source link

Mis-printing of accented characters #158

Closed ezaspy closed 1 year ago

ezaspy commented 1 year ago

I have run the following query which returns results but in the instances of accented characters in the domain, the characters are replaced with a strange combination which I do not know why they are formatted like this, examples below. I am using VSCode on macOS to run the command and I thought it could be something to do with the editor or the OS. I have changed my locale settings but it yeilds the same results.

dnstwist.py paypal.com -f csv -r --useragent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246"
...
homoglyph,xn--pypl-2nad.com,,,,ns1.markmonitor.com
...

xn--pypl-2nad.com should be páypàl.com I had a hunch it was not displaying correctly and was confirmed by querying VT: https://www.virustotal.com/gui/domain/xn--pypl-2nad.com

elceef commented 1 year ago

This is called Punycode which is a method of encoding strings containing Unicode characters into ASCII favored by DNS. By design CLI output displays domains in Unicode, while other output formats like CSV or JSON have domains encoded with Punycode.

ezaspy commented 1 year ago

Great! Managed to find a solution for it 👍 'string'.encode().decode('idna')