Closed Odyseus closed 6 years ago
Well thanks to you @Odyseus for reporting !! 👍🌟
@mitchellkrogza I think that we both (Py||Funceble && Ultimate) have to review the way we handle those typos or invalid domains !
Thanks again @Odyseus 👍🌟
P.S: I added this to my workflow for future commit or update of Py||Funceble.
Hello, @funilrys.
I'm just glad that I could be of some help. :+1:
I found these invalid host names while developing my own Hosts Manager application (a CLI app written in Python). I investigated a little more about valid host names and I found some more invalid host names.
The following table list the host names as found on the hosts file that can be found in this repository, the possible error/s and the possibly correct host name.
Host name | Possible error | Possibly correct name |
---|---|---|
1-1-1-.ib.adnxs.com |
Labels can't start/end with a hyphen. ("1-1-1-") | - |
247?realmedia.com |
Invalid character. ("?") | - |
adblade.com,popup |
Invalid character. (",") | adblade.com |
adtrackone.eu$document |
Invalid character. ("$") | adtrackone.eu |
allosexe-.myblox.fr |
Labels can't start/end with a hyphen. ("allosexe-") | - |
alyssa-.ifrance.com |
Labels can't start/end with a hyphen. ("alyssa-") | - |
durl=px.moatads.com |
Labels can't start/end with a hyphen. ("durl=") | px.moatads.com |
film-porno-.ze.cx |
Labels can't start/end with a hyphen. ("film-porno-") | - |
free‐celebrity‐tube.com |
Invalid character. ("‐") (1) | - |
goreanharbourreysa-.4ya.nl |
Labels can't start/end with a hyphen. ("goreanharbourreysa-") | - |
i-52b.-xxx.ut.bench.utorrent.com |
Labels can't start/end with a hyphen. ("-xxx") | - |
javakiba.org* |
Invalid character. ("*") | - |
mailto:info@mypornbible.com |
Invalid characters. (":", "@") | - |
mangas-porno-.has.it |
Labels can't start/end with a hyphen. ("mangas-porno-") | - |
moherland.pl, |
Invalid character. (,") | moherland.pl |
paris-.blogspot.ca |
Labels can't start/end with a hyphen. ("paris-") | - |
paris-.blogspot.ch |
Same as above. | - |
paris-.blogspot.co.id |
Same as above. | - |
paris-.blogspot.com |
Same as above. | - |
paris-.blogspot.com.ar |
Same as above. | - |
paris-.blogspot.com.br |
Same as above. | - |
paris-.blogspot.com.es |
Same as above. | - |
paris-.blogspot.com.tr |
Same as above. | - |
paris-.blogspot.co.uk |
Same as above. | - |
paris-.blogspot.de |
Same as above. | - |
paris-.blogspot.gr |
Same as above. | - |
paris-.blogspot.it |
Same as above. | - |
paris-.blogspot.mx |
Same as above. | - |
paris-.blogspot.no |
Same as above. | - |
paris-.blogspot.pt |
Same as above. | - |
paris-.blogspot.sk |
Same as above. | - |
preview-.stripchat.com |
Labels can't start/end with a hyphen. ("preview-") | - |
public‐sluts.net |
Invalid character. ("‐") (1) | - |
px.moatads.com,z.moatads.com |
Invalid character. (",") | px.moatads.com |
sexchat-.startspin.nl |
Labels can't start/end with a hyphen. ("sexchat-") | - |
telemetry.appex.bing.net:443 |
Invalid character. (":") | telemetry.appex.bing.net |
websitealive[0-9].com |
Invalid characters. ("[", "]") | - |
www.free‐celebrity‐tube.com |
Invalid character. ("‐") (1) | - |
www.just-anchor.com? |
Invalid character. ("?") | - |
www.public‐sluts.net |
Invalid character. ("‐") (1) | - |
xmr.-eu1.nanopool.org |
Labels can't start/end with a hyphen. ("-eu1") | - |
(1): This is a character that looks like the minus sign. The following is a table with info from the GNOME Character Map application about the possibly invalid character.
Representations | "‐" (U+2010 HYPHEN) (The invalid character) | "-" (U+002D HYPHEN-MINUS) (The minus sign) |
---|---|---|
UTF-8 | 0xE2 0x80 0x90 |
0x2D |
UTF-16 | 0x2010 |
0x002D |
C octal escaped UTF-8 | \342\200\220 |
\055 |
XML decimal entity | ‐ |
- |
In case that it could be useful, here is the Python 3 function that I use to check for valid host names.
#!/usr/bin/python3
import re
HOSTNAME_REGEX = re.compile(r"(?!-)[\w-]{1,63}(?<!-)$")
def is_valid_host(host):
"""IDN compatible domain validation.
"""
host = host.rstrip(".")
return all([len(host) > 1, len(host) < 253] + [HOSTNAME_REGEX.match(x) for x in host.split(".")])
This function is based on several functions that I found in this StackOverflow question. I just mainly translated it into a one-liner.
The function basically does the following:
(?!-)
).[\w-]
).{1,63}
).(?<!-)
).Side note 1: The explanation above is based on my own understanding. Since I'm not a professional developer of any kind, I could be horribly wrong. LOL
Side note 2: The regular expression is outside the function for performance reasons.
Thanks @Odyseus for reporting this. I will make some tweaks to the cleaning functions to deal with these errors. Thanks for your very detailed information it helps a lot. @funilrys yes indeed this needs some good looking at, a lot of the input sources seem to make typos on a frequent basis.
this would be great add to any tool i think
@xxcriticxx and @Odyseus I will get this in the works early next week
@mitchellkrogza you slacking lately :(
@xxcriticxx been a rough start to the year and had a loss in the family so I've been out of town for a while but back in action. Have no fear all issues will be addressed.
@mitchellkrogza We are sorry to hear that :(
Deepest condolences.
@mitchellkrogza sorry my friend
Hello @Odyseus @mitchellkrogza ,
Thanks to this issue I discovered an issue in PyFunceble. Please note that I reinforced the way check for an inactive domain with the previously referenced patch.
So that way @mitchellkrogza, we can work efficiently when we are going to do further structure development.
@Odyseus I did not use your snippets but as you indirectly contributed, into that patch, I would like, if you accept, to add you to the list of PyFunceble's contributors.
Thanks again.
Cheers, Nissar
Hello, everybody.
@funilrys: Thanks for the thought, but don't feel obligated to do so. I'm just happy to contribute however I can to any open source initiative. :+1:
I looked at the code on your patch and, in case that it could be useful for you to know, I also use a function to validate IP addresses. It uses the ipaddress module from Python's standard library. It checks for both IP types (IPv4 and IPv6), but it would be easy to just check for IPv4 only.
from ipaddress import ip_address
def is_valid_ip(address):
"""Validate IP address (IPv4 or IPv6).
Parameters
----------
address : str
The IP address to validate.
Returns
-------
bool
If it is a valid IP address or not.
"""
try:
ip_address(address)
except ValueError:
return False
return True
I'm pretty sure that I got this function from StackOverflow, but the exact source got lost in my browsing history.
Hello, everybody.
I have found what seems to be invalid host names.
Note that I said "seems to be" because I'm not entirely sure if all of the listed host names are invalid. It's for the experts to decide.
Thanks for your work, @mitchellkrogza and contributors. :+1: