Ultimate-Hosts-Blacklist / Ultimate.Hosts.Blacklist

The Ultimate Unified Hosts file for protecting your network, computer, smartphones and Wi-Fi devices against millions of bad web sites. Protect your children and family from gaining access to bad web sites and protect your devices and pc from being infected with Malware or Ransomware.
MIT License
1.3k stars 158 forks source link

Invalid host names #129

Closed Odyseus closed 6 years ago

Odyseus commented 6 years ago

Hello, everybody.

I have found what seems to be invalid host names.

247?realmedia.com
adblade.com,popup
adtrackone.eu$document
durl=px.moatads.com
free‐celebrity‐tube.com
javakiba.org*
mailto:info@mypornbible.com
moherland.pl,
public‐sluts.net
px.moatads.compx.moatads.com
px.moatads.com,z.moatads.com
telemetry.appex.bing.net:443
websitealive[0-9].com
www.free‐celebrity‐tube.com
www.just-anchor.com?
www.public‐sluts.net

Note that I said "seems to be" because I'm not entirely sure if all of the listed host names are invalid. It's for the experts to decide.

Thanks for your work, @mitchellkrogza and contributors. :+1:

funilrys commented 6 years ago

Well thanks to you @Odyseus for reporting !! 👍🌟

@mitchellkrogza I think that we both (Py||Funceble && Ultimate) have to review the way we handle those typos or invalid domains !

Thanks again @Odyseus 👍🌟

P.S: I added this to my workflow for future commit or update of Py||Funceble.

Odyseus commented 6 years ago

Hello, @funilrys.

I'm just glad that I could be of some help. :+1:

I found these invalid host names while developing my own Hosts Manager application (a CLI app written in Python). I investigated a little more about valid host names and I found some more invalid host names.

The following table list the host names as found on the hosts file that can be found in this repository, the possible error/s and the possibly correct host name.

Host name Possible error Possibly correct name
1-1-1-.ib.adnxs.com Labels can't start/end with a hyphen. ("1-1-1-") -
247?realmedia.com Invalid character. ("?") -
adblade.com,popup Invalid character. (",") adblade.com
adtrackone.eu$document Invalid character. ("$") adtrackone.eu
allosexe-.myblox.fr Labels can't start/end with a hyphen. ("allosexe-") -
alyssa-.ifrance.com Labels can't start/end with a hyphen. ("alyssa-") -
durl=px.moatads.com Labels can't start/end with a hyphen. ("durl=") px.moatads.com
film-porno-.ze.cx Labels can't start/end with a hyphen. ("film-porno-") -
free‐celebrity‐tube.com Invalid character. ("‐") (1) -
goreanharbourreysa-.4ya.nl Labels can't start/end with a hyphen. ("goreanharbourreysa-") -
i-52b.-xxx.ut.bench.utorrent.com Labels can't start/end with a hyphen. ("-xxx") -
javakiba.org* Invalid character. ("*") -
mailto:info@mypornbible.com Invalid characters. (":", "@") -
mangas-porno-.has.it Labels can't start/end with a hyphen. ("mangas-porno-") -
moherland.pl, Invalid character. (,") moherland.pl
paris-.blogspot.ca Labels can't start/end with a hyphen. ("paris-") -
paris-.blogspot.ch Same as above. -
paris-.blogspot.co.id Same as above. -
paris-.blogspot.com Same as above. -
paris-.blogspot.com.ar Same as above. -
paris-.blogspot.com.br Same as above. -
paris-.blogspot.com.es Same as above. -
paris-.blogspot.com.tr Same as above. -
paris-.blogspot.co.uk Same as above. -
paris-.blogspot.de Same as above. -
paris-.blogspot.gr Same as above. -
paris-.blogspot.it Same as above. -
paris-.blogspot.mx Same as above. -
paris-.blogspot.no Same as above. -
paris-.blogspot.pt Same as above. -
paris-.blogspot.sk Same as above. -
preview-.stripchat.com Labels can't start/end with a hyphen. ("preview-") -
public‐sluts.net Invalid character. ("‐") (1) -
px.moatads.com,z.moatads.com Invalid character. (",") px.moatads.com
sexchat-.startspin.nl Labels can't start/end with a hyphen. ("sexchat-") -
telemetry.appex.bing.net:443 Invalid character. (":") telemetry.appex.bing.net
websitealive[0-9].com Invalid characters. ("[", "]") -
www.free‐celebrity‐tube.com Invalid character. ("‐") (1) -
www.just-anchor.com? Invalid character. ("?") -
www.public‐sluts.net Invalid character. ("‐") (1) -
xmr.-eu1.nanopool.org Labels can't start/end with a hyphen. ("-eu1") -

(1): This is a character that looks like the minus sign. The following is a table with info from the GNOME Character Map application about the possibly invalid character.

Representations "‐" (U+2010 HYPHEN) (The invalid character) "-" (U+002D HYPHEN-MINUS) (The minus sign)
UTF-8 0xE2 0x80 0x90 0x2D
UTF-16 0x2010 0x002D
C octal escaped UTF-8 \342\200\220 \055
XML decimal entity ‐ -

In case that it could be useful, here is the Python 3 function that I use to check for valid host names.

#!/usr/bin/python3

import re

HOSTNAME_REGEX = re.compile(r"(?!-)[\w-]{1,63}(?<!-)$")

def is_valid_host(host):
    """IDN compatible domain validation.
    """
    host = host.rstrip(".")

    return all([len(host) > 1, len(host) < 253] + [HOSTNAME_REGEX.match(x) for x in host.split(".")])

This function is based on several functions that I found in this StackOverflow question. I just mainly translated it into a one-liner.

The function basically does the following:

Side note 1: The explanation above is based on my own understanding. Since I'm not a professional developer of any kind, I could be horribly wrong. LOL

Side note 2: The regular expression is outside the function for performance reasons.

mitchellkrogza commented 6 years ago

Thanks @Odyseus for reporting this. I will make some tweaks to the cleaning functions to deal with these errors. Thanks for your very detailed information it helps a lot. @funilrys yes indeed this needs some good looking at, a lot of the input sources seem to make typos on a frequent basis.

xxcriticxx commented 6 years ago

this would be great add to any tool i think

mitchellkrogza commented 6 years ago

@xxcriticxx and @Odyseus I will get this in the works early next week

xxcriticxx commented 6 years ago

@mitchellkrogza you slacking lately :(

mitchellkrogza commented 6 years ago

@xxcriticxx been a rough start to the year and had a loss in the family so I've been out of town for a while but back in action. Have no fear all issues will be addressed.

smed79 commented 6 years ago

@mitchellkrogza We are sorry to hear that :(
Deepest condolences.

xxcriticxx commented 6 years ago

@mitchellkrogza sorry my friend

funilrys commented 6 years ago

Hello @Odyseus @mitchellkrogza ,

Thanks to this issue I discovered an issue in PyFunceble. Please note that I reinforced the way check for an inactive domain with the previously referenced patch.

So that way @mitchellkrogza, we can work efficiently when we are going to do further structure development.

@Odyseus I did not use your snippets but as you indirectly contributed, into that patch, I would like, if you accept, to add you to the list of PyFunceble's contributors.

Thanks again.

Cheers, Nissar


Before the patch

bildschirmfoto vom 2018-02-09 12-38-07

After the patch

bildschirmfoto vom 2018-02-09 12-30-10

Odyseus commented 6 years ago

Hello, everybody.

@funilrys: Thanks for the thought, but don't feel obligated to do so. I'm just happy to contribute however I can to any open source initiative. :+1:

I looked at the code on your patch and, in case that it could be useful for you to know, I also use a function to validate IP addresses. It uses the ipaddress module from Python's standard library. It checks for both IP types (IPv4 and IPv6), but it would be easy to just check for IPv4 only.

from ipaddress import ip_address

def is_valid_ip(address):
    """Validate IP address (IPv4 or IPv6).

    Parameters
    ----------
    address : str
        The IP address to validate.

    Returns
    -------
    bool
        If it is a valid IP address or not.
    """
    try:
        ip_address(address)
    except ValueError:
        return False

    return True

I'm pretty sure that I got this function from StackOverflow, but the exact source got lost in my browsing history.