StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.62k stars 2.21k forks source link

Some obvious false calls (invalid hosts) #437

Closed notracking closed 6 years ago

notracking commented 6 years ago

Hi Steven,

I came across some invalid hosts in you lists, please have a look at them.

steveblack.txt:0.0.0.0 accounts.pkr.com.invalid steveblack.txt:0.0.0.0 adfactor.nl.invalid steveblack.txt:0.0.0.0 bravenet.com.invalid steveblack.txt:0.0.0.0 cibleclick.com.invalid steveblack.txt:0.0.0.0 gigya.com.invalid steveblack.txt:0.0.0.0 globaltrack.com.invalid steveblack.txt:0.0.0.0 naiadsystems.com.invalid steveblack.txt:0.0.0.0 parse.ly.invalid steveblack.txt:0.0.0.0 reedbusiness.com.invalid steveblack.txt:0.0.0.0 seeq.com.invalid steveblack.txt:0.0.0.0 test.invalid steveblack.txt:0.0.0.0 thecounter.com.invalid steveblack.txt:0.0.0.0 top20.com.invalid steveblack.txt:0.0.0.0 topmmorpgsites.com.invalid

welcome[bot] commented 6 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

ScriptTiger commented 6 years ago

Hey, thanks for your interest in improving the list! Just out of curiosity, what is your criteria for an "invalid host"? Such as not being accessible from your personal locale, not containing a certain type of script, etc. Just so we can duplicate your tests if need be.

funilrys commented 6 years ago

@ScriptTiger As you may know, in Funceble I use the IANA Root Zone Database to detect if the given domain has a valid extension.

So this issue is about .invalid extension not being registered to IANA database which means that the domains with the .invalid extensions are paradoxically invalid.

ScriptTiger commented 6 years ago

We might have to do some further digging on this with the curator to find out their reasoning. Steven will of course give the final word, but, let's be honest, not every ISP/DNS, especially more oppressive or restrictive ones, care about what is IANA-registered, so it once again depends on the vector, under what conditions those names can be called, etc. Under some ISPs, including internal U.S. Government ones for internal department affairs that use massively extensive and complex border gateway security and internal DNS, IANA may well not even exist. Is this a feasible argument?

funilrys commented 6 years ago

Your argument makes sense, I'm okay with that point of view, but,

And we didn't talk about IP which could go far away in your argument ...

About "IANA may well not even exist", I consider this as a good reading...

ScriptTiger commented 6 years ago

I don't mean IANA should not exist, I mean in certain circumstances it is as if it does not. I of course completely agree with it's need. And China as a country,the largest country in the world, uses internal DNS and only indirectly references IANA. There are many cases in which shady ISPs inject their own entries, etc., or even many times governments have regular policies to poison some entries or make up their own. This also goes for a number of smaller religious-centric governments, as well, such as Indonesia, which is the 4th largest country in the world.

I only randomly mentioned internal government networks because that is my personal experience, so I apologize for the confusion there. I just know that it is possible and not an uncommon practice to deviate from IANA.

All things said and done, these entries may truly be invalid, but they may or may not be as obvious as one would think and a word with their source curator is due, in my view.

notracking commented 6 years ago

I think this in the same boat as localhosts references, see: https://en.wikipedia.org/wiki/.invalid

ScriptTiger commented 6 years ago

That Wikipedia article actually triggers a whole new line of thought I never thought of before:

This allows the use of these names for either documentation purposes or in local testing scenarios.

It is also possible these domains are being used locally by malware to coordinate internal malware services by creating "test" loopback domains. Local malware could be poisoning your DNS and redirecting your pages to locally served sites that have been processed through, say, adware to replace all occurrences of the word "PC" with links to an affiliate website that has malware associated to the word "PC". Obviously there are much more menacing things that could be done with this with multiple different kinds of services, including botnet management, but the adware example is a common one I think most people have seen before because it works on the HTTP layer and it doesn't matter what browser you're using.

And in addition to the HTTP hijacking example, if malware is using a concept similar to vhosts, there could be several such domains on the same computer used to coordinate different malware services or different features of the same malware service. Or you could even have multiple different malware services with multiple different domains all running services on multiple different ports and seriously hogging up resources in multiple different ways. It is possible that by blacklisting such domains, these services can no longer communicate to one another and their functionality thus breaks down.

ScriptTiger commented 6 years ago

Just as an update to this, I have sent an e-mail to Peter over at Yoyo.org and am eagerly awaiting his take on this.

ScriptTiger commented 6 years ago

Below I've included Peter's response. It's something he does internally as part of his curation process. Basically it's his way of marking domains for later deletion if his personal curation checks continue to mark them as false-positives or otherwise no longer applicable.

So these domains are indeed already being handled by the curator.

Hello Mr Tiger,

There's no documentation I'm afraid. It's just a convention I use for myself when I receive notification from someone about a false positive. I rename the hostname as ".invalid" and it gets automatically removed after a certain number of checks which happen every week via cron.

Well, mostly it does - sometimes I add a domain or host and "force" it to be added. Whenever I add an entry, it gets checked for validity, and sometimes I want to add something that doesn't seem valid to the script. And sometimes I mark them as ".invalid". Because they're also marked as "forced", they don't get checked again, so they never get removed. I manually clean them out a couple of times a year.

The reason I mark them as ".invalid" is so that they're still in the list for a few weeks (6, I thinK? I'd have to check how many failures means they get deleted...) and I have a chance to rename them back if I find out that the false positive report was untrue.

Is that enough explanation?

cheers,

  • Peter

On Fri, Nov 10, 2017 at 12:30:56PM +0800, ScriptTiger wrote:

Greetings! I am a long-time user of the ad server list, so I would firstly like to convey my appreciation for your dedication to the cause! Secondly, myself and others have been curious if you have any documentation as to the reason several ".invalid" domains have been placed on the list. Just a few such examples that appear on your list are as follows:

accounts.pkr.com.invalid

ad2games.com.invalid

adfactor.nl.invalid

Some are under the impression they should be stricken immediately as being "impossible" domains on IETF-compliant devices that keep it a reserved domain. However, I know not all devices are compliant and I also know that even on compliant devices such domains can still be used internally as loopback domains for adware services, etc. So I was just curious as to your take on this as I do truly respect your curation of the list and always like to go to the source before making any rash judgments.

Thanks again in advance and I look forward to hearing from you!

StevenBlack commented 6 years ago

Thank you @ScriptTiger for chasing this down. And thank you, @notracking, for reporting this.

I'm fine with leaving things as they are, including Peter's annotations.