StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.55k stars 2.2k forks source link

Sorting out dead entries with nmap #937

Closed Throwaway684736 closed 5 years ago

Throwaway684736 commented 5 years ago

Nice list! I'm actually keeping one myself for many years, adding and removing the most annoying networks and trackers.

Here is the question: Has anyone tried scanning for dead hostnames with nmap? I think it would be more beneficial since you can also portscan 80,81,443,8080 and if there's no active daemons - exclude it from the resulting list.

Lets think about it.

welcome[bot] commented 5 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

spirillen commented 5 years ago

Hi @Throwaway684736

@mitchellkrogza and @funilrys are running a project like this with dead.hosts and there have been some open discussion on you suggestion, which for what i can understand from a number of threads not what @StevenBlack desire to do, as he said (something like)

The domain can easily be reactivated by a script and then just as quickly be deactivated after a scam/malware etc.

I would which I could remember this threads as i was reading about them last night, but maybe @ScriptTiger and @dnmTX can remember more about this.

funilrys commented 5 years ago

@spirillen I guess you're talking about #412 ☺️ and all linked issues, PyFunceble, @dead-hosts and @Ultimate-Hosts-Blacklist?

Well, it's a different view. But it's not the place to talk about it as Steven is clear: Detection of invalid is awesome but dead/inactive flag (even with a recheck everyday like we do) is not great ☺️

Now, I guess you can close this Steve @StevenBlack. I will handle this in one of the other projects if needed 😉 #412 alone says it all 😁

spirillen commented 5 years ago

@funilrys you're spot on :smile:

ScriptTiger commented 5 years ago

The first problem with this would be intermittent service outages. Some guy running a junker laptop in India which he has refurbished to serve malware that frequently has power outages or suffers from inadequate system resources and loses connectivity throughout the day may pop as a dead host one minute and a live one the next. Intermittent service outages could also happen at the ISP level or anywhere else between you and the host. With modern load balancing and dynamic route propagation techniques you could even have a service outage on one route and not another so one person could detect a host as dead and someone else could detect it as not dead at the same exact time, depending on the routes they are taking through different networks to get to the host.

Many cyber attacks involve rule engines of some kind to target certain groups, such as someone running a specific browser as detected by the client's user agent header request that has a specific known vulnerability, or someone whose source IP is detected as being from a particular geographic region, most often country. Such rule engines can be implemented on everything from network devices on up to Web server configurations and beyond and trying to check if a host is dead or not will have different results depending on the client you're using, your IP address, your metadata, etc., and whether or not all of those things meet the criteria to be able to reach a given host or if it is intentionally filtered out and the host is not reachable to you because you don't meet the criteria the attacker is looking for and there's no reason to waste resources on sending you a reply. This also ensures they narrow their exposure to only select targets of their choosing and no one else, as not to take unnecessary risk and allow them to remain hidden longer.

As for Nmap specifically, it only has limited HTTP/HTTPS capability and would not be an ideal client to use when detecting HTTP/HTTPS-specific features, or when trying to present a specific signature while being detected by a Web server. Simple ping or other ICMP one-shot detection methods are also out of the question, as Web services and ICMP work on completely different ports and protocols, one may be blocked while another not. It's also important to note that this repo blocks various forms of telemetry, not only that of Web traffic. There are multiple sections in the hosts files dedicated to IoT devices, smart TVs, operating system telemetry, and even specific Android apps that all use different ports and protocols, as well. Trying to ascertain a level of confidence for each entry using Nmap would take an unreasonable amount of time and resources and would make this repo no longer relevant within a reasonable production life cycle duration.

All in all, the reason why this repo depends on multiple sources curated by different means and methods is that this repo's intent is for highly curated, high-touch results that are relevant this week, but maybe not the next and get dropped accordingly. This repo intends to include hosts that may be unreliable from one minute to the next, while also drop hosts that are known with a certain confidence to no longer be a threat. Basically each hosts list published from this repo strives to be as relevant as possible during its version cycle, or otherwise reasonable duration as deemed by Steven in his development cycle.

StevenBlack commented 5 years ago

Hi @Throwaway684736 source curation is done upstream. We strive to combine only actively curated sources.

It's not our place to trim anybody's list of hosts based on additional tests that we imagine, right or wrong, to be relevant or accurate or whatever.

Another factor: I estimate that 90%+ people who download our hosts files do so once, or very infrequently.

I'm not about to take someone's curated list and mechanically deem part of it to be cool. It's NOT cool because someone who cares whom we trust, upstream from here, has decided otherwise. That's good enough for me.

Thanks for your input though!

Closing

ScriptTiger commented 5 years ago

Another factor: I estimate that 90%+ people who download our hosts files do so once, or very infrequently.

That's a fair point, also. I said "week" in my comment, but the life cycle could be much longer. I guess not everyone is using my script. wink wink (https://github.com/ScriptTiger/Unified-Hosts-AutoUpdate)