StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.11k stars 2.17k forks source link

Set of hosts to block - potential duplicates compared to current lists #1858

Closed Wiggum127 closed 2 years ago

Wiggum127 commented 2 years ago

I have, over the years, collected some hosts which were not yet part of a hosts file at the time of detecting them locally.

It could be some/most of them are in a hosts list by now, but I have no toolchain to do mass checks myself.

The list is a mix of tracking, scam and advertising hosts.

Attached is a set of those hosts. Is someone capable of checking which are still not yet in one of the hosts files? We can than add the missing ones to the project.

Personal blocks-part1.txt

StevenBlack commented 2 years ago

Hi @Wiggum127.

I put your list on my system clipboard, then using ghosts I get the following 44-domain intersection.

ghosts --clip --intersection                                                                                                                                      13:15:20
----------------------------------------
Base hosts file summary:
----------------------------------------
Location: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Domains: 100,735
Bytes: 3.1 MB
----------------------------------------
Compared hosts from clipboard summary:
----------------------------------------
Location: clipboard
Domains: 134
Bytes: 4.1 kB
intersection: [
  ad.nettvservices.com
  adcfg.intowow.com
  ads-pebblemedia.adhese.com
  ads.creative-serving.com
  ads.exoclick.com
  adservice.google.be
  adserving.unibet.com
  api2.batmobi.net
  app-measurement.com
  app.adjust.com
  b.scorecardresearch.com
  be.sitestat.com
  bp.adkmob.com
  cm.adkmob.com
  cm.steepto.com
  e.apsalar.com
  e.crashlytics.com
  events.appsflyer.com
  firebase-settings.crashlytics.com
  geoinfo.intowow.com
  ilead.itrack.it
  imgg-cdn.steepto.com
  pagead2.googlesyndication.com
  pixel.quantserve.com
  profile.adkmob.com
  reports.crashlytics.com
  sb.scorecardresearch.com
  settings.crashlytics.com
  smetrics.telenet.be
  smtx.belfius.be
  spaces.slimspots.com
  ssdk.adkmob.com
  ssl.google-analytics.com
  sstats.asadventure.com
  sts.batmobi.net
  swa.bol.com
  tags.tiqcdn.com
  udm.scorecardresearch.com
  ufs.adkmob.com
  usdk.batmobi.net
  vrt.hb.omtrdc.net
  www.google-analytics.com
  www.googleadservices.com
  www.googletagservices.com
]
Intersection: 44 domains
dnmTX commented 2 years ago

@Wiggum127 just FYI. Regarding domains like 06468e39a4c44d0f31948f15c27dab91.safeframe.googlesyndication.com for example. Those are generated on the fly,are never the same and never repeat. You're just wasting your time by trying to block them. They're part of Google's Safe Browsing,so if you don't want to see them in your logs you need to turn it off in Chrome. It's called Dynamic DNS.Do some research,make yourself familiar 👍

Wiggum127 commented 2 years ago

@dnmTX I more or less guessed this already seen the composition of those names. Good thing is I've been blocking/hiding/uninstalling Chrome on all family devices for years. Guess those few entries will originate from guest devices on the network. I've blacklisted googlesyndication.com in my local pi-hole setup as wildcard in the meantime. None of the subdomains will ever be resolved again. Thanks anyway for the feedback.

dnmTX commented 2 years ago

This is what i got on my end as a wildcard blocking when it comes to Google. It might look too aggressive to some,i'm just fine with it 😄 :

# google
address=/2mdn.net/doubleclick.net/doubleclick.com/googlecode.com/googletagmanager.com/#
address=/googlesyndication.com/googlezip.net/gvt2.com/gvt3.com/metric.gstatic.com/urchin.com/#
Wiggum127 commented 2 years ago

@StevenBlack If I carve out the existing hosts (intersection) from my list AND remove the dynamic googlesyndication.com and fls.doubleclick.net domains, what do I need to do to contribute the remainder to your project ?

dnmTX commented 2 years ago

remove the xxxxxxxxxxx.privacysandbox.googleadservices.com also. The same dynamic dns rule applies to them as well 👍

Oh....and those two as well: 0.0.0.0 graph.facebook.com
0.0.0.0 graph.instagram.com Too many issues with them,search the repo here. They were listed and eventually removed due to breakages.

StevenBlack commented 2 years ago

Thank you for this kind offer @Wiggum127 I appreciate that.

Right now I'm more focused on making the base list smaller.

Presently the base list carries 100,737 domains. That's heavy. Twenty percent too heavy, I reckon. That's just my gut feeling.

At this juncture I'm more interested in doing as much with less, as opposed to doing a tiny bit more with more.

Wiggum127 commented 2 years ago

@StevenBlack Should I contribute to more specialised lists instead?

Mind you, the fact your list is "long", is also the consequence of so many crap domains on the Internet. I'm blocking 1.5M unique records. So in that respect, your list is already less then the bare minimum for me. It's all a matter of perspective.

StevenBlack commented 2 years ago

@Wiggum127 yeah this is EXACTLY what we don't do here.

The soft option would be, amalgamate ALL the lists. Fuck it, make a million domain list, right?

Except when you do that (as I once did) you end up with

We take the opposite approach.

In the end we do more with a much smaller list. Large lists come with all sorts of problems, including on Windows, which is a dumpster fire.

Wiggum127 commented 2 years ago

@StevenBlack No problem. I understand and appreciate the approach. That's why your lists are so heavily used and referenced. And indeed, using custom hosts files on Windows, even small ones, is a nightmare.

In the spirit of cafefully curated lists, I'm here offering to contribute my personal, small and curated list of domains to be avoided by a browser/app. I can take out the intersection, drop the dynamic stuff to make it more clean etc. Can I present it to other list owners for their evaluation? I leave it up to your discretion to decide what of the remaining of my personal list should make it into the lists you have.

StevenBlack commented 2 years ago

@Wiggum127 where is the list located? I need to see it's curation history. All I see is a .txt file attached to this issue.

Wiggum127 commented 2 years ago

@StevenBlack How I came about this list was taking a snapshot of existing lists, like yours, to stop the obvious and watch DNS request going out. The snapshot was taken on 15/10/2017. The header from your file:

This hosts file is a merged collection of hosts from reputable sources,

with a dash of crowd sourcing via Github

#

Date: October 15 2017

Extensions added to this file: fakenews, gambling

Number of unique domains: 44,919

Over the years I started adding the unwanted hosts in a personal Gitlab repository and this is synced on my pi-hole and Android devices with Netgaurd imported hosts files. Not a daily task, not even weekly and not with each and every app. But gradually and ocasionally building up. This little list has served me well so far. Looking at the intersection by now, approx. 44 of the ones I myself added also made it into your list by today.

Since I thought sharing the output to serve others, I reach out to your project to see if these entries can become part of one of your lists. I have no public list. I want to contribute and help maintainers of existing public lists, like your project, with this little set of suggestions.

dnmTX commented 2 years ago

@shreyasminocha you interested adding whatever remains after @Wiggum127 sort them out to your list so to be curated properly in the feature?

StevenBlack commented 2 years ago

Thanks @Wiggum127 I appreciate the offer but I decline.

A history of active curation is among the prerequisites to list here.

trimechee commented 2 years ago

Thank you very much @StevenBlack for your awesome lists ! and thank you so much @Wiggum127 to share your great lists ! may be you can integrate your list in this project : https://github.com/Ultimate-Hosts-Blacklist/Ultimate.Hosts.Blacklist

Wiggum127 commented 2 years ago

I've never had any breakages with blocking the grap.facebook/instagram/messenger.com, so they will stay in my personal files.

As so your suggestion to add to Ultimate.Host.Blacklist. I already asked the question there, prior to asking it here. No response so far.

Seems nobody is interested in curated contributions. No problem, I just move on.

trimechee commented 2 years ago

@Wiggum127 it's sad :( I'm sure many appreciate your work, thnak you so much ! maybe you can contact the owner of Bancuh dns to block the dangerous sites

https://github.com/ragibkl/adblock-dns-server