AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
25.1k stars 1.8k forks source link

Choosing filter lists for AdGuard Home #1325

Closed DandelionSprout closed 4 years ago

DandelionSprout commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue. YOU MAY DELETE THE PREREQUISITES SECTION.

Problem Description

As I was testing with a stock AGH install on AMD64 Windows, I noticed that the available selection of lists on default installations is still pretty weak: image

So I think it's time to consider trusting some more lists enough for them to be included by default as well. I'd recommend one or more of the following (in mostly random order):

—— Non-regional ——

—— Regional ——

And that's even without getting into more proof-of-concept-esque list purposes like censorship evasion lists, Nintendo DS online servers, software update blockers, anti-gambling, and so on. I also chose to exclude unofficial domains versions of ABP-formatted lists.

Proposed Solution

Add some more lists to default AdGuard Home installations.

Alternatives Considered

Adding a link to FilterLists.com on the filter settings page would've also been great, although I (who've worked very extensively on its list cataloguing) will be the first to admit that it's become a daunting site for newcomers to browse through, and is almost unusable on phones.

Additional Information

These list additions would be done independently of you guys' plans to create an official script to convert adblocker lists to AGH lists, and these list suggestions should hopefully not interfere with those plans of yours for the immediate time being.

ameshkov commented 4 years ago

Yeah, it's time indeed.

Marked as "help wanted" -- please suggest other lists here.

Slipi089 commented 4 years ago

The different lists from https://energized.pro/ would be great

DandelionSprout commented 4 years ago

Since the Energized lists are compiled almost entirely from other lists (except for ~20,000 of the entries that come from Energized Core), I personally would have chose to prioritise adding the best ones among the source lists instead.

Slipi089 commented 4 years ago

The best ones will always be different for different people and you will never be able to please anyone

DandelionSprout commented 4 years ago

Fair point.

ameshkov commented 4 years ago

I'd surely prioritize lists compiled using more advanced syntax over the old-school hosts files. People tend to turn on every available filter list, and bloated many-megabytes hosts lists aren't really good for performance.

WildByDesign commented 4 years ago

@ameshkov I'm not sure if this resource was listed yet or not: https://github.com/mmotti/adguard-home-filters

There is a fantastic REGEX list and also a separate filter list. The filter list appears to pull in from 9 or 10 other reputable source lists and actually parses those lists into a single AGH-compatible list which utilizes the more advanced syntax specifically just for AGH.

Cheers!

gentlyxu commented 4 years ago

For Chinese, anti-AD is a good choice. the filter list used Adblock-style syntax, see: https://raw.githubusercontent.com/privacy-protection-tools/anti-AD/master/anti-ad-easylist.txt , You can press command+F, search the "/" symbol.

and the project url: https://github.com/privacy-protection-tools/anti-AD/

This project is still being updated and will get better and better

adworacz commented 4 years ago

Another list recommendation: https://github.com/notracking/hosts-blocklists/blob/master/adblock/adblock.txt

I opened an issue with the repo, and after some discussion the author was willing to produce a list that's compatible with Adguard Home (and other browser plugins, to be honest).

The nice thing about this list is that it is optimized to take advantage of wildcard/subdomain matching, which saves a LOT of space.

It's also updated regularly, and includes a multitude of sources that are checked regularly for updates, dead domains, and more.

notracking commented 4 years ago

@gentlyxu be aware that these lists contain quite some filters that a lot of people would consider false positives.

0.0.0.0 scribol.com
0.0.0.0 tracking.epicgames.com
0.0.0.0 logrocket.com
0.0.0.0 loggly.com
0.0.0.0 om.cbsi.com
0.0.0.0 ipinfo.io
0.0.0.0 v.shopify.com
0.0.0.0 adobedtm.com
0.0.0.0 c.evidon.com
0.0.0.0 ereg.wip3.adobe.com
0.0.0.0 csi.gstatic.com
0.0.0.0 g.msn.com
0.0.0.0 sascdn.com
0.0.0.0 duckdns.org
0.0.0.0 dl.360safe.com
0.0.0.0 prf.hn
0.0.0.0 placehold.it
0.0.0.0 digg.com
0.0.0.0 feedburner.com
0.0.0.0 rambler.ru
0.0.0.0 jiathis.com
0.0.0.0 uol.com.br
0.0.0.0 rs6.net
0.0.0.0 com.com
0.0.0.0 s0.2mdn.net
0.0.0.0 pr0gramm.com
0.0.0.0 consent.cmp.oath.com
0.0.0.0 s.youtube.com
0.0.0.0 purch.com
0.0.0.0 fpdownload.macromedia.com
0.0.0.0 dynatrace.com
0.0.0.0 om.cbsi.com
0.0.0.0 auditude.com
0.0.0.0 om.cbsi.com
0.0.0.0 app.link
gentlyxu commented 4 years ago

@notracking Thank you. I will make a white list to include these host names you mentioned and more...

ameshkov commented 4 years ago

Vietnamese blocklist: https://github.com/AdguardTeam/AdguardForiOS/issues/1298#issuecomment-540063277

ameshkov commented 4 years ago

Meanwhile, we made a simple helper tool for anyone making filter lists: https://urlfilter.adtidy.org/

With the help of it, you'll be able to check if the domain is blocked by any of the existing filter lists.

notracking commented 4 years ago

I'd surely prioritize lists compiled using more advanced syntax over the old-school hosts files. People tend to turn on every available filter list, and bloated many-megabytes hosts lists aren't really good for performance.

Do not forget that all network based filters (||ads.google.com^ as well as normal host files) are extremely resource efficient. A single regex (or 'dynamic') filter is multiple factors slower than 1k of network filter rules.

Krizzii commented 4 years ago

I use 1 list called dbl.oisd.nl. Works great so far.

vager88 commented 4 years ago

Hi. I don't believe this is something that anyone has considered. We need a way to filter out proxys/redirectors. There's lots of them out there and people use them to bypass DNS filters. I found the following website that has a list of categorys.. http://dsi.ut-capitole.fr/blacklists/index_en.php

The one called redirectors would be the list of proxys that can be incorporated. Personally, I've copied that file and I'm loading to my server locally.. However it would be great if it can be added as an option.

DandelionSprout commented 4 years ago

Université Toulouse 1's lists are in TAR.GZ-format only, which no adblockers known to ever have existed are able to unpack and use on their own without user interaction. It's also the reason why virtually none of their lists are on FilterLists.com at the time of writing.

An alternative could be https://blocklist.site/app/dl/redirect, but its licence system is confusing to figure out heads and tails of.

vager88 commented 4 years ago

Yes, agreed on the formatting of the TAR.GZ format.. Hence why I unpacked it and modified the file in a format that I could use. I did come across this page. https://github.com/StevenBlack/hosts

This appears to be a hosts generator program.. but I don't know how to go and automate it to pull from the Toulouse site and generate the file the output you need. Maybe someone smarter than I can take a look.

lordraiden commented 4 years ago

https://www.reddit.com/r/oisd_blocklist/comments/dwxgld/dbloisdnl_internets_1_domain_blocklist/

This is probably the best DNSBL list compilation, quite curated.

https://dbl.oisd.nl/

DandelionSprout commented 4 years ago

Looking through https://includes.oisd.nl/, it seems to include content from north of 700 lists, including at least one anti-piracy list. Just for the record.

It appears to have attempted to seemingly automatically include other lists it can find anywhere on the internet as well, including a high 3-digit amount of lists they had to exclude afterwards, half of which even I had never heard of before.

From my understanding, https://abp.oisd.nl/ aligns more with AdGuard Home's syntax goals than what https://dbl.oisd.nl/ does. Alternately there's https://dblmobile.oisd.nl/, which seems to have ~42,000 entries instead of ~420,000, but does not have a || version.

sjhgvr commented 4 years ago

From my understanding, https://abp.oisd.nl/ aligns more with AdGuard Home's syntax goals than what https://dbl.oisd.nl/ does.

I've added that list (abp.oisd.nl) today as a response to this user request

So when Adguard Home is loaded with that list it automatically blocks access to all subdomains to those domains (as stated in the manual for as to how ||domain.com^ works)?

If that is true, than both AdGuard Home and NextDNS are superior to Pi-hole in my opinion.

adworacz commented 4 years ago

So when Adguard Home is loaded with that list it automatically blocks access to all subdomains to those domains (as stated in the manual for as to how ||domain.com^ works)?

If that is true, than both AdGuard Home and NextDNS are superior to Pi-hole in my opinion.

You are correct, subdomains are blocked as well.

And THANK YOU for adding the new version using the Adblock syntax. Not only can Adguard Home use it, but browser plugins like uBlock Origin as well.

imTHAI commented 4 years ago

Another list recommendation: https://github.com/notracking/hosts-blocklists/blob/master/adblock/adblock.txt the author was willing to produce a list that's compatible with Adguard Home

Good news. I was using his domain and hostname lists and converting it on a daily basis, to adguardhome syntax. ( available on my GitHub repository). When I was using dnsmasq before adguardhome, I was already using those notracking lists.

Those lists are the most complete, IMHO.

ameshkov commented 4 years ago

Hey all, thanks for all the cool suggestions!

Regarding oisdl, energized, etc - we might need to have a separate category for them. This is the kind of blocklists that are trying to block as much as possible, and if you use one of them, you really don't need to use any other blocklist. The downside here is that there is no granular control over what's blocked, and a higher number of false positives.

The end goal is to provide users with a number of categories and filter lists users can choose from, e.g. "ads", "tracking domains", "google trackers", "facebook trackers", etc.

There's one more thing why I am looking for narrow-scoped lists and promoting adblock-style syntax. Take a look at this study: https://petsymposium.org/2020/files/papers/issue2/popets-2020-0021.pdf

Among other things, this study shows that big blocklists often cause apps breakage:

image

The one way to fix that is to teach our blocking software to detect what device (or, ideally, app) is making a request, and then use this information to unblock some domains when blocking it leads to apps breakage. That's why there's this issue where we're looking for data that could help us implement automatic device-detection.

PS: Meanwhile, I see I haven't mentioned the host-list compiler tool we recently released. It may help with converting old-style BL into the adblock-style syntax: https://github.com/AdguardTeam/HostlistCompiler

ameshkov commented 4 years ago

A few more blacklists: https://github.com/smed79/blacklist

Ultrabenosaurus commented 4 years ago

Yes, agreed on the formatting of the TAR.GZ format.. Hence why I unpacked it and modified the file in a format that I could use. I did come across this page. https://github.com/StevenBlack/hosts

This appears to be a hosts generator program.. but I don't know how to go and automate it to pull from the Toulouse site and generate the file the output you need. Maybe someone smarter than I can take a look.

I came here to recommend StevenBlack Hosts. I've been using Unified + Fakenews from StevenBlack for years as my primary list, with a few others tacked on just in case.

To use any StevenBlack list just scroll down the README a bit to the first table. Use the "Raw hosts" column for a link to your chosen list for AdGuard Home or uBlock Origin, and the "Non GitHub mirror" column for use with HostsMan.

The second table tells you the sources used. Each variant list has its own README with its own sources table, so you can pick the perfect list for you!

xxsxx47 commented 4 years ago

Another list https://github.com/lightswitch05/hosts

ameshkov commented 4 years ago

hpHosts discontinued: https://github.com/AdguardTeam/AdGuardHome/issues/1536

hoefs commented 4 years ago

I like: http://abp.oisd.nl/

Main list (ABP format) | Identical, but in Adblock plus format | Adguard Home RECOMMENDED!

The author for this list created it originally for Pi-hole, but is now advising everybody to move to Adguard Home (as I did - running 2 Adguard Home instances: 1x docker + 1x Raspberry Pi):

Pi-hole users should check out Adguard Home instead. (Supports: CNAME blocking / Adblock-style blocklists and my personal reason why I switched: blocks all subdomains (to blocked domains) by default!! Pi-hole needs every damn subdomain listed separately; inefficiënt, insecure!)

imTHAI commented 4 years ago

I switched from https://github.com/notracking/hosts-blocklists/ to http://abp.oisd.nl/ one month ago. I've only one false positive since then. I also recommend this list.

ammnt commented 4 years ago

Unified hosts from Steven Black is really good choice!👍

pedrolamas commented 4 years ago

So I think it's time to consider trusting some more lists enough for them to be included by default as well. I'd recommend one or more of the following (in mostly random order):

—— Non-regional ——

...

I strongly recommend that you do not use the StreamingAds list indicated on the first post...

I manually added this list and used it for a few days, only to find out that Spotify stopped working!

This seems to affect only the Spotify Android client, and was reported back in 2018 to the repo owner several times (here), only to have it marked as "will not fix".

/cc @DandelionSprout

DandelionSprout commented 4 years ago

Looking into the matter and seeing that https://github.com/FadeMind/hosts.extras/issues/25 and https://github.com/FadeMind/hosts.extras/issues/33 were both closed without even citing a reason for it, that does indeed cast major doubts on the seriousness of FadeMind, especially seeing as I presume that ≥80% of those that'd need a streaming-ad blocklist would be phone or smart-TV users. Thanks for the heads-up.

notracking commented 4 years ago

I switched from https://github.com/notracking/hosts-blocklists/ to http://abp.oisd.nl/ one month ago. I've only one false positive since then. I also recommend this list.

Would you mind sharing the false positives that you encountered?

imTHAI commented 4 years ago

Would you mind sharing the false positives that you encountered?

I've reset my personal rules on my AdguardHome server, as I do from time to time. So I can't tell which one I have faced before ( maybe 1 or 2 now). But yesterday it was gigatribe.com. It's like a direct connect network. The list blocks the entire domain ||gigatribe.com^ ( so it includes login and next.gigatribe.com, we can't connect to the network ). It's not big deal and, again, I strongly recommend this list.

notracking commented 4 years ago

@imTHAI thanks for your reply! Domain is queued for whitelisting, will take until the next auto update to show up in the repository.

JOduMonT commented 4 years ago

few more list with description : https://en.wikipedia.org/wiki/Comparison_of_DNS_blacklists than an example on how it could be listed with : https://iplists.firehol.org/

imTHAI commented 4 years ago

than an example on how it could be listed with : https://iplists.firehol.org/

I don't understand what you mean about firehol ? firehol, that has I use since +15y btw, is a firewall and those predefined lists are IP lists. (For example I'm using firehol+ipset tool to block entire countries from titillate my port 22). I don't see how it could be used at DNS level ?

DandelionSprout commented 4 years ago

AdGuard Home has actually been able to filter by the domains' IP addresses for ~6 months now.

That being said, IP lists tend to block very broadly, so it'd require a truly exceptionally and outstandingly good IP list to even be considered for AGH inclusion. It's also worth mentioning that almost every single list in https://en.wikipedia.org/wiki/Comparison_of_DNS_blacklists were either several years out of date or lacked raw free versions.

mupkoo commented 4 years ago

https://www.reddit.com/r/oisd_blocklist/comments/dwxgld/dbloisdnl_internets_1_domain_blocklist/

This is probably the best DNSBL list compilation, quite curated.

https://dbl.oisd.nl/

I agree. I have it as my first list and it catches everything before there is a need to check the rest. And as an added benefit, it is being updated quite often and there are a lot of people reporting false positives if there are any

JOduMonT commented 4 years ago

@imTHAI

than an example on how it could be listed with : https://iplists.firehol.org/

yes of course what I thought is not clear

I thought instead of compiling them inside an issue under GitHub it could be worth it to be inspired by firehol and have statistic on every list. The best part of firehol is to being able to see the which list are overlapping.

here another list of list: https://firebog.net

Personally I use both

ameshkov commented 4 years ago

@ArtemBaskal here are some implementation details:

  1. "Add blocklist" shows a modal dialog with two options:

    • "Choose from a list" -- show a new modal dialog with a selection of blocklists
    • "Add a custom list" -- opens the old "New blocklist" dialog
  2. "Known blocklists" dialog contains a list of blocklists grouped by category. Mockup: https://uploads.adguard.com/up04_AdGuard_DNS_-_Filter_lists__Moqups_bbfh9.png. Home icon leads to the blocklist homepage, "source code" icon opens the raw list URL. Please note that this is just a mockup, use our standard styles for that modal dialog.

  3. Categories and lists should be configurable via a single json file (so that it was easier for people to pull request new lists).

BugZappr commented 4 years ago

Here's a French site, by on Chez Airelle:

http://rlwpx.free.fr/WPFF/hosts.htm

I don't read French, but here's a translation:

http://translate.google.com/translate?hl=fr&langpair=fr|en&u=http://rlwpx.free.fr/WPFF/hosts.htm

I very much support the idea of having lists of segmented interests. I don't mind some ads, if they aren't too flashy or cramping real content. I support capitalism; but only when it's in the public good. The HPHosts had a really great breakdown before they got killed by the bean-counters. Adware, tracking, malware, exploits, fraud, hijack, misleading marketing, illegal pharma, phishing, potentially unwanted, and warez/piracy. Privacy might be a category, too. Many lists will not break down as cleanly; so fewer categories could be supported.

I think that "Unified hosts from Steven Black" is a poor choice. Half of the lists he offers are "fakenews" sites; many of which I think are more trustworthy than CBS, NBC & ABC. Probably political operatives involved at some step in the pipeline, offering up blocklists for sites publishing independent news and views as fake news: i.e. false positives, IMHO. Very few of the sites he blocks are actually malware; which is what I'm interested in. The larger these lists are, the more it impacts performance negatively - especially in length of list fetches and memory usages. I'd rather not bog down my browser with huge lists, so I'd prefer to just block malware/fishing/annoyances - but lots of them.

liamengland1 commented 4 years ago

Airelle list is notorious for false positives.

If you don't like the "fake news" lists offered by steven black, just use the unified ads and tracking offering. See: https://github.com/StevenBlack/hosts/blob/master/readme.md#list-of-all-hosts-file-variants

Ping @stevenblack

StevenBlack commented 4 years ago

@BugZappr

Half of the lists he offers are "fakenews"... Very few of the sites he blocks are actually malware...

WTF? Dude, that's insulting. Educate yourself.

DandelionSprout commented 4 years ago

Things worth noting:

1) Airelle's lists are only available in compressed format, which means they generally can't be included or updated in any adblocker tools that I know of.

2) My personal understanding of Steven Black's lists, is that the "fakenews" variants get such entries sourced from https://raw.githubusercontent.com/marktron/fakenews/master/fakenews. The only ones on that list that anyone should even remotely consider going to, are rt.com and christwire.org. I'd definitely go to NBC's websites a thousand times before I'd go to something called racerelations.news.

3) If the Readme of the plain list version is to be believed, the mentioned sources would've meant that around ~25,000 of the 57,000 domains are against malware, phishing or scams in some way.

thespad commented 4 years ago

One of the "problems" with the fakenews list is that includes a load of satirical news sites as well as actual "fake news" sites which makes it worthless to me.

uservictor commented 4 years ago

Consider to add NoTrack list for blocking online trackers. https://gitlab.com/quidsup/notrack-blocklists#other-projects

jerrac commented 4 years ago

So, with all the suggestions of what lists to include, I could see things getting very confusing...

Can I suggest that there be two lists of filters on the /#filters page?

The first would be vetted filters that AdGuard has determined are good and actively monitors for problems. (What that all actually means, I'm not sure. Maybe they subscribe to issue queues or something for the lists? Or put them through some form of automated testing?)

The second would be community suggested filters. Use at your own risk. Maybe there'd be a separate repo for this list that people can submit pull requests to to get new filters added?

I'd also suggest adding a "Description" column. Then populate that with however the different lists describe themselves.

ameshkov commented 4 years ago

We're going to keep a really short list of the default pre-installed filter lists. Maybe it will be just the AdGuard DNS filter only.

But when you click "Add blocklist", you'll see a list of blocklists you can choose from.

The second would be community suggested filters. Use at your own risk. Maybe there'd be a separate repo for this list that people can submit pull requests to to get new filters added?

There's always filterlists.com where one can look for lists, I am not sure if it makes sense to duplicate it.