StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.55k stars 2.2k forks source link

False positives on lightswitch05's hosts lists #657

Closed celsoazevedo closed 6 years ago

celsoazevedo commented 6 years ago

All AMP pages stopped loading on my phone and laptop. It seems that lightswitch05's hosts list now blocks some ampproject.org URLs even though they're not all are used for tracking or ad delivering (eg: status.ampproject.org):

0.0.0.0 ampproject.org 0.0.0.0 cdn.ampproject.org 0.0.0.0 status.ampproject.org 0.0.0.0 www.ampproject.org

Just a few hours ago some podtrac.com URLs were added. This service is used by many podcasts and it's impossible to download new episodes if www.podtrac.com is blocked:

0.0.0.0 podtrac.com 0.0.0.0 analytics.podtrac.com 0.0.0.0 dev.podtrac.com 0.0.0.0 east.dev.podtrac.com 0.0.0.0 west.dev.podtrac.com 0.0.0.0 dts.podtrac.com 0.0.0.0 www.podtrac.com

There are other examples - intercom.io, for example, used for things like customer support - and again even their own website is blocked. It's like blocking www.google.com because Google runs Google Analytics.

I'm not trying to criticise lightswitch05's work, I can see this being useful for some users, but I think there are too many false positives on both of his lists.

welcome[bot] commented 6 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

Skaronator commented 6 years ago

Just noticed the same. ampproject pages should not be blocked.

dnmTX commented 6 years ago

Can't you guys open a issue at https://github.com/lightswitch05/hosts/issues and discuss the matter with @lightswitch05 there?

celsoazevedo commented 6 years ago

@dnmTX I checked @lightswitch05's list description and it says:

A programmatically expanded version of my base Ads & Tracking list. This list is more likely to contain false positives, but is still very reliable and I recommend using it.

lightswitch05's list actually does what the description says. I opened this issue here because it makes sense to block these domains:

So the list itself is doing its job - it's probably a good list for improved privacy - but it blocks more than just ads and tracking.

If lightswitch05 wants to weaken his list effectiveness and clean up some of the hosts, then I guess it should be kept as part of the main unified hosts. If not, maybe this list should be removed as it's full of hosts that aren't tracking or serving ads.

In case users think this kind of blocking is useful, maybe a new hosts category should be created for users that want more privacy?

StevenBlack commented 6 years ago

@celsoazevedo actually, active curation is a pre-requisite to be included here.

Let me know if this gets fixed at source. If it doesn't, then we'll have to evaluate this on principles.

lightswitch05 commented 6 years ago

@celsoazevedo I would be happy for you to open tickets like this on my repo. I definitely don't want to break legitimate services. I also don't want to add to the maintenance workload of this repo.

ampproject

The project enables the creation of websites and ads that are consistently fast, beautiful and high-performing across devices and distribution platforms

The rest of the home page goes on to talk more about ads, higher engagement, etc. Can either @celsoazevedo or @Skaronator provide me with a concrete example of how blocking this domain breaks legitimate services? I've actually had this one blocked for a little while and haven't notice any ill effects.

podtrac

The standard for podcast analytics

Same thing goes for this site. If its breaking a bunch of stuff, I'll remove it. My podcasts are working fine - but that not an all inclusive test :) - concrete examples would be appreciated. Worst case I can limit the blocking to analytics.podtrac.com.

StevenBlack commented 6 years ago

Thanks for chiming-in @lightswitch05 !

Skaronator commented 6 years ago

Can either @celsoazevedo or @Skaronator provide me with a concrete example of how blocking this domain breaks legitimate services?

The whole Google App is based on the ampproject and at least 80% of the links doesn't work.

image Here a screenshot. The app is similar to Google News and it just link other Websites which use the AMP (Accelerated Mobile Pages) technology. Here is the link for the heise article. Click

Skaronator commented 6 years ago

And most of the mobile google search results use AMP as well since Google prioritize these result.

https://www.wired.com/2016/02/google-will-now-favor-pages-use-fast-loading-tech/

lightswitch05 commented 6 years ago

Those screenshots straight up look like ads to me, but I went ahead and removed ampproject.org from the list. I avoid all things google, so I'll take your word for it that its legitimate services.

Still waiting to hear about about podtrac

Skaronator commented 6 years ago

Those screenshots straight up look like ads to me

Yeah not the greatest examples but Heise is a IT news site and Gamestar is a gaming news site.

celsoazevedo commented 6 years ago

@ightswitch05 Thanks for your reply. Two examples of broken AMP pages:

If you check the source code, you'll see a few scripts (ads, analytics, iframes, etc) being loaded from cdn.ampproject.org, including the main .js (https://cdn.ampproject.org/v0.js). Without v0.js we get a white page:

screen shot 2018-06-12 at 21 18 03

I don't like AMP, but "everyone" uses it these days...

Regarding podtrac, one of the networks using their service is Relay.fm: https://www.relay.fm/shows

Links look like this: https://www.podtrac.com/pts/redirect.mp3/traffic.libsyn.com/radarrelay/undertheradar128.mp3

It's just a redirect, but it doesn't work if www.podtrac.com is blocked.

--

It's easy to find more examples of over blocking. For example, drift.com is blocked, but so is their blog blog.driftt.com, login page login.driftt.com, etc. There are 29 hosts blocking intercom.io. And so on.

lightswitch05 commented 6 years ago

I removed AMP: lightswitch05@6e62515eb4d8484d8a018200b1e1e466f6d0e263 I also removed the podtrac root domain (left dts.podtrac.com and analytics.podtrac.com): lightswitch05@365185639e9ce0550d611097ba765dc57c86fe7e

As for the domain expansion, hosts lists do not support wildcards. I programmatically expand domains as a effort to make up for the loss of wildcard blocking. This method definitely ends up including innocuous subdomains like the examples you gave. But that wouldn't be any different then wildcard blocking.

I'll continue programmatically expanding domains that I believe deserve entire wildcard blocking, but I'm happy to remove ones that cause issues. If there are cases you disagree with me on, there is always the whitelist feature.

celsoazevedo commented 6 years ago

I removed AMP: lightswitch05@6e62515 I also removed the podtrac root domain (left dts.podtrac.com and analytics.podtrac.com): lightswitch05@3651856

Thanks @lightswitch05!

As for the domain expansion, hosts lists do not support wildcards. I programmatically expand domains as a effort to make up for the loss of wildcard blocking. This method definitely ends up including innocuous subdomains like the examples you gave. But that wouldn't be any different then wildcard blocking.

I'll continue programmatically expanding domains that I believe deserve entire wildcard blocking, but I'm happy to remove ones that cause issues. If there are cases you disagree with me on, there is always the whitelist feature.

Domain expansion is good, but I don't think it's a good idea to block everything just because a subdomain is used for tracking or ad delivering. Usually subdomains like blog.domain.com, help.domain.com, support.domain.com, <de/fr/en>.domain.com, etc, are safe.

Whitelists help, but it's hard to debug specially on mobile. I noticed the AMP issue on my phone and the quick fix was to revert my hosts file. Even on a computer, after updating the hosts file we also have to flush the computer cache and the browser cache... all just to unlock a subdomain.

Maybe I'm just suffering of "security fatigue" 😛Aggressive adblocking + hosts files sometimes gets a bit annoying, that's why I prefer more refined lists with hosts known to serve ads, malware, analytics, etc, over lists that just try to block everything.

Anyway... Thanks for maintaining your list. We usually don't give enough credit to the people behind these adblock/hosts lists.

@StevenBlack Unless someone else has anything to add I think we can close this.

StevenBlack commented 6 years ago

Thank you everybody!

Closing.