StevenBlack / hosts

πŸ”’ Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.37k stars 2.19k forks source link

Add AdguardDns Filter as a hosts sources #1181

Open calm3285 opened 4 years ago

calm3285 commented 4 years ago

I've been using the AdguardDns service for a while and its really effective against ads. Currently I trying to use pihole with cloudflare dns. I'm just wondering if it could be possible to add AdguardDns filter

welcome[bot] commented 4 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

bigdargon commented 4 years ago

Hi @angelss197200

AdGuardSDNSFilter (AdGuard Simplified domain names filter) is similar to StevenBlack's hosts, meaning it is made up of many filters, but different in hosts and filter formats. This file template: https://github.com/AdguardTeam/AdGuardSDNSFilter/blob/master/Filters/filter.template

With the hosts format, you must list all subdomains in the block list and the host format only blocks the domains you list, if the subdomain does not exist, the system will resolve the DNS normally. But with filters like Adguard only need one line of rules, the blocking application will automatically block subdomains.

For example, for a domain that needs to block doubleclick.net, the hosts format will have to list as0.0.0.0 abc.doubleclick.net 0.0.0.0 xyz.doubleclick.net...; and the filter format only one rule ||doubleclick.net^ to block all subdomains.

hostsVN supports both of these formats and has 2 files:

calm3285 commented 4 years ago

So you are telling me that: It isn't possible to merge with the hosts file And it is more practical the adguard way

liamengland1 commented 4 years ago

How about one of these two:

https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt

https://v.firebog.net/hosts/AdguardDNS.txt

StevenBlack commented 4 years ago

LE @llacb47 using my ghosts tool β€” a side-project for now β€” I get the following report:

$ ./hoststools -i https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts  -c https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt

--------------------------------------------------------------------------------
Base hosts file summary:
--------------------------------------------------------------------------------
Location: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Domains: 51822
Bytes: 1604700
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Compared hosts file summary:
--------------------------------------------------------------------------------
Location: https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt
Domains: 29172
Bytes: 688422
--------------------------------------------------------------------------------
Intersection: 3795 domains

so AdguardDNS.txt would contribute (29,172 - 3,795) = 25,377 new domains.

Looking at AdguardDNS.txt, eyeballing from top to bottom, man that file is a mess. I have a legitimate question: how does one curate a large, random order file?

Interesting the file contains 29,193 domain lines, but only 29,172 unique domains, so there are 21 duplicate domains, which is not a lot, but it reinforces my central point: you can't curate a hot mess.

Looking at the commit history of adguardDNS.txt, that's automated curation. Nobody appears to be eyeballing this.

Therefore, it appears this is not a good candidate source for us.

Your thoughts?

dnmTX commented 4 years ago

Your thoughts?

@StevenBlack just to pass out some info(regarding the duplicates,only). As i'm not sure how exactly the AdGuard team executing their DNS protection,i can confirm on their adblocker app(windows PC) whenever there is a update on one of their lists the program automatically filtering the duplicates before it's loaded(this is based on me looking at the logs after list update). So i guess it could be the same with their hosts file.It's a mess but it might gets filtered and organized before it's loaded. Again,i'm guessing here.

calm3285 commented 4 years ago

@StevenBlack that is not the official filter, that is one converted by @r-a-y this is the official https://github.com/AdguardTeam/AdGuardSDNSFilter

r-a-y commented 4 years ago

Yes, I wrote the HOSTS converter for the AdguardDNS list mentioned in this thread. My converted HOSTS file is not an official Adguard filter list of any kind. It was made for my own usage, but a lot of people have started using it.

I never noticed the duplicates because I don't go through each line. Like you said, 20,000+ lines is a lot of domains! That, and AdAway (which I use this list with) automatically removes duplicates so it's not a big issue for me.

If I were to remove the duplicates, it would mean I'd have to track each domain during the conversion, which would increase the memory usage. I've added an issue on my repo, but it's not a priority for me at the moment. If the duplicates took up 20% or more of the list, then I'd consider it a problem.

r-a-y commented 4 years ago

If I were to remove the duplicates, it would mean I'd have to track each domain during the conversion, which would increase the memory usage.

Okay, turns out the memory usage was minimal! I've removed the duplicate domains from my list. Check out the updated list: https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt

Steven, let me know if you still spot any issues.

StevenBlack commented 4 years ago

Time is ticking on this one... Sorry for the delay @angelss197200 and Ray @r-a-y.

I guess I'm still not sold on this one for two reasons.

1) You can't curate a hot mess, and

2) Adding this would jack our base-list hosts count by about 50%. We've been larger than that before, we can certainly live there.

I'm in the process of eyeballing a diff between March 1 and the latest version. Here's what I see:

StevenBlack commented 4 years ago

Here's an interesting comparison of the TLD tally in both lists, for TLD with > 100 domains. I'm using ghosts for this.

Adding AdGuard gives us much better Russia/Asia coverage. At least, nominally just based on numbers...


$ ./ghosts -tld -c https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt
--------------------------------------------------------------------------------
Base hosts file summary:
--------------------------------------------------------------------------------
Location: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Domains: 54,279
Bytes: 1.7 MB
TLD tally:
   com: 31,024
   net: 6,579
   pl: 5,873
   jp: 809
   ru: 787
   info: 781
   vn: 710
   de: 631
   org: 591
   io: 388
   uk: 361
   cn: 348
   eu: 290
   nl: 254
   co: 230
   fr: 219
   biz: 195
   tv: 174
   us: 165
   xyz: 164
   at: 157
   mobi: 127
   it: 117
   cz: 103
   br: 100

....

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Compared hosts file summary:
--------------------------------------------------------------------------------
Location: https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt
Domains: 30,293
Bytes: 715 kB
TLD tally:
   com: 19,773
   net: 2,186
   ru: 1,822
   info: 929
   de: 628
   bid: 366
   pro: 286
   io: 265
   xyz: 238
   org: 237
   jp: 218
   co: 168
   cn: 168
   club: 167
   biz: 156
   vn: 153
   site: 148
   top: 139
   uk: 120
   pw: 109

....

--------------------------------------------------------------------------------
Intersection: 3,825 domains
dnmTX commented 4 years ago

What about duplicates? Another thing worth checking Steve @StevenBlack is the original list's licensing for distribution. The fact that @r-a-y is offering it,doesn't mean the AdGuard team will be happy about it. What i know is their DNS list is not advertised anywhere.It's just,somebody,some time ago found the link to it and it went from there. After all,we need to remember that most of what AdGuard is offering is not free. They do relay on revenue to stay afloat.I personally paid $40 for their Windows's AdBlocker lifetime license.

r-a-y commented 4 years ago

You can't curate a hot mess

It's not my list per se; it's Adguard's and they have a pretty good track record with adblocking.

Their filter list is actually quite well-curated:

https://adguardteam.github.io/AdGuardSDNSFilter/Filters/filter.txt

You can find their source here:

https://github.com/adguardteam/AdGuardSDNSFilter/

I just chose to strip the comments because it's not necessary in my eyes.

The only issue is sometimes my converter might have some parsing problems, but that's on me. I will not feel offended if you decide not to merge Adguard's list with yours. It's your hosts file after all, so feel free to close this ticket.

FWIW, I'm personally not a fan of one-size-fits-all hosts files. I prefer to pick and choose what I want to use.

If you think Adguard's DNS list is too large, I also offer Adguard's mobile lists separately if you want better mobile coverage. See https://github.com/r-a-y/mobile-hosts/blob/master/readme.md

Side note: I personally would like to see a better mobile hosts list out there. AdAway recently made improvements to theirs, which is great!


The fact that @r-a-y is offering it,doesn't mean the AdGuard team will be happy about it.

Their license is GPLv3: https://github.com/AdguardTeam/AdGuardSDNSFilter/blob/master/LICENSE

StevenBlack commented 4 years ago

Those are good points Dan @dnmTX β€” I'll engage with them directly, and see what they say.

XhmikosR commented 4 years ago

I'd personally be very interested in seeing this assuming there are no problems with AdGuard themselves + licensing.

BTW I started using @r-a-y's list and it works pretty good. I was using https://v.firebog.net/hosts/AdguardDNS.txt before but there's no repo to track the changes and I'm not sure how frequently the list gets updated.

My only suggestions to @r-a-y would be:

  1. sort the domains; it helps with compression and makes maintenance easier IMHO
  2. add more info in the header of each file like a repo link and so on
XhmikosR commented 4 years ago

For what is worth, I've found a couple more sources (I haven't compared them, just listing them)

Both seem to be automated.

0xRustlang commented 4 years ago

I also using their filter list for several months and i didn't faced problems with it (false positives)

jawz101 commented 4 years ago

My thoughts are that it would be nice for more international coverage. My doubts is if AdGuard's original format was "simple" because either the list or AdGuard's software processed the list to wildcard block subdomains- the list may be useless.

Looks like bigdargan said the same thing

StevenBlack commented 4 years ago

Prompted by Issue #1347, here is the latest comparison of AdGuard, listing the top 20 TLDs by tally in each hosts file.

$  ./ghosts -tld -c https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt
----------------------------------------
Base hosts file summary:
----------------------------------------
Location: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Domains: 57,464
Bytes: 1.8 MB
TLD tally:
   com: 31,696
   net: 6,832
   pl: 5,991
   info: 871
   jp: 817
   org: 765
   vn: 763
   ru: 761
   eu: 602
   de: 567
   live: 435
   io: 399
   nl: 374
   uk: 373
   cn: 347
   xyz: 286
   co: 282
   fr: 270
   biz: 236
   us: 212

  ==> S N I P <==

----------------------------------------
----------------------------------------
Compared hosts file summary:
----------------------------------------
Location: https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt
Domains: 36,200
Bytes: 842 kB
TLD tally:
   com: 22,790
   net: 2,451
   ru: 1,848
   cn: 1,288
   info: 989
   de: 631
   club: 446
   site: 381
   xyz: 379
   pro: 372
   top: 337
   bid: 291
   io: 280
   org: 279
   jp: 232
   vn: 192
   co: 174
   biz: 162
   fun: 122
   uk: 119

  ==> S N I P <==

----------------------------------------
Intersection: 3,991 domains
aphuang2013 commented 2 years ago

adding my two cents, I definitely see more sites in xyz in the ad block. e.g. eulucky2022.xyz is particular annoying since a) it is a fake website using google logo b) it prevent people using mobile to block it since it is so intrusive. desktop browsers actually do thing correctly. my point is: this kind the behavior allowed by the provider needs to be stopped and not sure the correct approach. BTW, I did add that site to myhost

rai510 commented 1 year ago

i came across this blocklist from this thread been using for 3 years list is well curated and didn't find any false positive https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt @r-a-y thank you