StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.87k stars 2.23k forks source link

Convert to Adblock List? #1763

Closed pogue closed 2 years ago

pogue commented 3 years ago

I don't use my HOSTS file for blocking content because I find it to hard to unblock or whitelist on the fly. I would love it if there was either:

Would either of these be possible without too much trouble?

Thanks very much in advance, pogue

welcome[bot] commented 3 years ago

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

ghost commented 3 years ago

The easiest way is to use AdGuard host compiler to convert a host file to Adblock format. https://github.com/AdguardTeam/HostlistCompiler

Example JSON config:

{
    "name": "StevenBlack",
    "sources": [
        {
            "source": "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts",
            "type": "hosts"
        }
    ],
    "transformations": [
        "Compress"
    ]
}
ghost commented 3 years ago

I converted the host file with the HostCompiler, here is the result:

hostlist-compiler -c /media/nas/git/adguard/resources/sb.json -o /media/nas/git/adguard/stevenblack.adblock
ℹ Starting @adguard/hostlist-compiler v1.0.12                                                                                                                                              19:57:56
ℹ Starting the compiler                                                                                                                                                                    19:57:56
ℹ Configuration: {                                                                                                                                                                         19:57:57
    "name": "StevenBlack",
    "sources": [
        {
            "source": "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts",
            "type": "hosts"
        }
    ],
    "transformations": [
        "Compress"
    ]
}
ℹ Start compiling https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts                                                                                                         19:57:57
ℹ Original length is 99910                                                                                                                                                                 19:57:57
ℹ Length after applying transformations is 99910                                                                                                                                           19:57:57
ℹ The list was compressed from 99913 to 64051                                                                                                                                              19:58:18
ℹ Final length of the list is 64057                                                                                                                                                        19:58:18
ℹ Writing output to /media/nas/git/adguard/stevenblack.adblock                                                                                                                             19:58:18
ℹ Finished compiling                          

RAW-Version: https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock

dnmTX commented 3 years ago

Not even close to ideal,definitely needs imrovement(s) and @Zelo72 would advise you to contact the developer(s) and address those bugs before recommending it here or anywhere else for that matter: This is nonsense:

||localhost^
||localhost.localdomain^
||local^
||broadcasthost^
||ip6-localhost^
||ip6-loopback^
||ip6-localnet^
||ip6-mcastprefix^
||ip6-allnodes^
||ip6-allrouters^
||ip6-allhosts^
||0.0.0.0^

Only Exclamation point is permited in AdBlock lists to be used as a comment,the compiler left the Hashes almost everywhere,besides the top portion,instead of converting them:

! Source: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
!
# Title: StevenBlack/hosts
#
# This hosts file is a merged collection of hosts from reputable sources,
# with a dash of crowd sourcing via GitHub
#

Another Exmple:

# Contributions by:
# Kicelo, Dominik Schuermann.
# Further changes and contributors maintained in the commit history at
# https://github.com/AdAway/adaway.github.io/commits/master
#
# Contribute:
# Create an issue at https://github.com/AdAway/adaway.github.io/issues
#

# [163.com]
||analytics.163.com^
||crash.163.com^

The empty/blank lines also can be removed,it would shrink the file even more

ghost commented 3 years ago

Thanks, the host compiler seems to still have some bugs. Quick and dirty fix:

curl -s -L https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | grep -Ev '^127\.0\.0\.1|^255\.255\.255\.255|^::|^fe80::|^ff00::|^ff02::|0\.0\.0\.0 0\.0\.0\.0$' >/media/nas/tmp/work/sb.txt
hostlist-compiler -c /media/nas/git/adguard/resources/sb.json -o /media/nas/git/adguard/stevenblack.adblock
sed -i 's/^\#/\!/' /media/nas/git/adguard/stevenblack.adblock
sed -i '/^$/d' /media/nas/git/adguard/stevenblack.adblock
sed -i '1,9d' /media/nas/git/adguard/stevenblack.adblock #Remove hostcompiler comments

Not beautiful, but rare ;)

RAW-Version (fixed): https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock

dnmTX commented 3 years ago

Looks better,for sure 😉 👍

Shorter version:

curl -s https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | sed '/^127./,/.0.0$/d' >/media/nas/tmp/work/sb.txt

-L is not needed,there are no redirects to the link,check with curl -v for yourself 👍

ghost commented 3 years ago

Nice, thanks!

pogue commented 3 years ago

Thanks for the replies, guys. Now is this something I can do in Windows or am I going to have to install some kind of Linux Emulator or use the Google Cloud Console? I'd ideally like to have it run everyday or every or whatever and just save it to a folder where my uBlock could pick it up.

I know they have curl for Windows but I'm not familiar with the sed command.

ghost commented 3 years ago

You can use my unofficially generated Steven Black Adblock list, it will be updated automatically every night. I have included the generation in my autoupdate script for my other filter lists.

RAW: https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock

pogue commented 3 years ago

You can use my unofficially generated Steven Black Adblock list, it will be updated automatically every night. I have included the generation in my autoupdate script for my other filter lists.

RAW: https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock

Awesome thanks, but what happens when something happens and your system goes down or I need to get it from a different source? 😁

I'd love to have instructions on how to generate this myself. For example, here is curl for Windows. I found a sed for Windows, but it hasn't been updated in 11 years, which I guess isn't that big of a deal for a very simple utility like sed.

I don't know PowerShell, but I could make a simple batch file to perform the same actions. Although here is the same functionality explained in PS: Use PowerShell to Replace Text in Strings

Example command: get-content somefile.txt | where { $_ -match "expression"}

If anybody knows PowerShell and wants to help me figure this out, any help is welcomed!

Thanks again guys! I'll use @Zelo72's link for now, but I would like to figure out how to do it myself (unless the author of this list wants to provide it as an alternative to the HOSTS file, that would probably be the most ideal solution).

pogue commented 3 years ago

I'm trying to combine these into a source to block fake news outlets for an article I'm writing. There's also an anti-clickbait list, but I'd say there are some pretty subjective site choices on there. Pretty much anything is clickbait nowadays. But here's the list: https://assets.windscribe.com/custom_blocklists/clickbait.txt

That comes from my VPN provider, WindScribe. They also aggregate sites from this Wikipedia article but that would probably take some serious regex'ing to scrape that for just the domain names.

ghost commented 3 years ago

You can use Ubuntu Terminal on Windows 10.

pogue commented 3 years ago

You can use Ubuntu Terminal on Windows 10.

Thanks, I'll look into that.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

pogue commented 2 years ago

Well, it looks like @Zelo72 deleted his account, so no more automatic blocklist from him. Anyone want to step up to the plate and host it?

n1ckal commented 2 years ago

Can I point out that having this hosts file converted into an AdBlock filter list, will only protect you from stuff rendered through the browser, any malware or other stuff that attempts to connect to sites using any mechanism other than a browser protected by adblock - will absolutely not be prevented unless you implement as hosts file that will kill name lookups.

Apologies if I've just necro'd this thread.

pogue commented 2 years ago

I understand where you're coming from. However, I don't like blocking things at the DNS/HOSTS level because that makes it inaccessible in the browser, even when it's a false positive.

Not to mention, I myself personally use a WIDE variety of anti-fingerprinting, anti-malware, anti-phishing tools, and so forth where I have listed on my page Anti Fingerprinting + Privacy & Security Browser Extensions

Sorry, I originally put the wrong URL in the above link, I have now corrected it

However, nonetheless, with maybe the some type of zero day exploit hitting your browser (which would be very widespread and well known and be patched very quickly), uBlock Origin, according to the author:

uBlock Origin (uBO) is not an "ad blocker", it is a wide-spectrum blocker, which happens to be able to function as a mere "ad blocker". But it can also be used in a manner similar to NoScript (to block scripts) and/or RequestPolicy (to block all 3rd-party servers by default), using a point-and-click user interface.

Blocking mode uBO Wiki

So, uBO is actually prohibiting the browser itself from accessing that URL before it can cause any harm or damage to the browser.

I also wanted to add this article. Although it's from 2014, it succulently lists my problems with using Hostman vs uBo. (Although the author is using ABP)

The best way to block ads: AdBlock Plus vs. a custom hosts file HostsMan)

I did do a search for converting hosts files into ABP files, but I only found the reverse - ABP into hosts files, unfortunately. 😔

pogue

pogue commented 2 years ago

Thanks, the host compiler seems to still have some bugs. Quick and dirty fix:

curl -s -L https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | grep -Ev '^127\.0\.0\.1|^255\.255\.255\.255|^::|^fe80::|^ff00::|^ff02::|0\.0\.0\.0 0\.0\.0\.0$' >/media/nas/tmp/work/sb.txt
hostlist-compiler -c /media/nas/git/adguard/resources/sb.json -o /media/nas/git/adguard/stevenblack.adblock
sed -i 's/^\#/\!/' /media/nas/git/adguard/stevenblack.adblock
sed -i '/^$/d' /media/nas/git/adguard/stevenblack.adblock
sed -i '1,9d' /media/nas/git/adguard/stevenblack.adblock #Remove hostcompiler comments

Not beautiful, but rare ;)

RAW-Version (fixed): https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock

Is there a Windows version of this? I can download curl for windows and I found sed for Windows sed for Windows - GnuWin32 - SourceForge

However, I haven't tested either of these to see if if they will work as the commands listed above.

Also, the Zelo72 file for the "fixed" version of the hosts file is 404.

rautamiekka commented 2 years ago

Is there a Windows version of this? I can download curl for windows and I found sed for Windows sed for Windows - GnuWin32 - SourceForge

Cygwin (x64 only), possibly Git Bash, and WSL are your best options beyond conffing a Linux VM with access to your files.

^ Cygwin's problem will be updating it manually by re-running the installer. However, if you'll only use it for this, you shouldn't need to update.

pogue commented 2 years ago

Is there a Windows version of this? I can download curl for windows and I found sed for Windows sed for Windows - GnuWin32 - SourceForge

Cygwin (x64 only), possibly Git Bash, and WSL are your best options beyond conffing a Linux VM with access to your files.

^ Cygwin's problem will be updating it manually by re-running the installer. However, if you'll only use it for this, you shouldn't need to update.

Yeah, I've used Cygwin in the past but it's been ages. But it seems like the stuff I downloaded like curl and sed are written for Windows, so I'm not sure if I need Cygwin to do the commands listed above.

rautamiekka commented 2 years ago

Hmm, true, if you replace grep with an appropriate sed one it should, although you'll need PowerShell so you don't have to replace the single quotes + caret (in Command Prompt ^ is an escape), and possibly worry about the pipe mangling the data.

pogue commented 2 years ago

Hmm, true, if you replace grep with an appropriate sed one it should, although you'll need PowerShell so you don't have to replace the single quotes + caret (in Command Prompt ^ is an escape), and possibly worry about the pipe mangling the data.

Unfortunately, I have no idea how to use PowerShell. Are you saying I should get grep instead of sed?

rautamiekka commented 2 years ago

Unfortunately, I have no idea how to use PowerShell.

Luckily for this, you don't need to learn anything new.

Are you saying I should get grep instead of sed?

Dunno how you came to that conclusion when I said the complete opposite.

You could install grep too, up to you; I was thinking that since you're on Window$ you likely wouldn't wanna install yet another app you'll only have a small use for, but then again you probly don't already know sed, so it's just better to install grep too to not have to learn new stuff.

hagezi commented 2 years ago

Hi,

based on the current host file, I create a compressed Adblock and Unbound version of the list every day.

AdBlock: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/3p/stevenblack.adblock.txt

Unbound: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/3p/stevenblack.unbound.conf

Greetings, Gerd

ScriptTiger commented 2 years ago

Can someone confirm that the AdBlock list @hagezi just posted works for them? The format looks easy enough. As @pogue said, the more people that provide a solution the better, just for resiliency purposes, etc. I could easily add it to my projects, as well, as it looks like a natural fit.

Pre-generated Block Lists: https://scripttiger.github.io/alts/

Windows Batch File Conversion Scripts: https://github.com/ScriptTiger/Hosts-Conversions

Windows, Linux, and Mac Conversion Binaries Written in Go: https://github.com/ScriptTiger/Hosts-BL

ScriptTiger commented 2 years ago

@dnmTX, @pogue,

I just updated the website and Hosts-BL binaries with the new Adblock format. Please confirm everything works.

@StevenBlack, feel free to close this issue whenever you feel it's been satisfactorily addressed. If there are any further issues at this point, they can just be handled while the issue is closed, since these format requests are a bit out of scope anyway.

Pre-generated Block Lists: https://scripttiger.github.io/alts/

Windows, Linux, and Mac Conversion Binaries Written in Go: https://github.com/ScriptTiger/Hosts-BL/releases

If anyone needs the Windows batch file version, let me know. Otherwise, the batch files are kind of just in maintenance mode now while I focus on porting things to Go in my limited free time. After everything has been ported, I'll continue with the regular code reviews, as before, and update them with any missing functionality.