Closed pogue closed 2 years ago
Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!
The easiest way is to use AdGuard host compiler to convert a host file to Adblock format. https://github.com/AdguardTeam/HostlistCompiler
Example JSON config:
{
"name": "StevenBlack",
"sources": [
{
"source": "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts",
"type": "hosts"
}
],
"transformations": [
"Compress"
]
}
I converted the host file with the HostCompiler, here is the result:
hostlist-compiler -c /media/nas/git/adguard/resources/sb.json -o /media/nas/git/adguard/stevenblack.adblock
ℹ Starting @adguard/hostlist-compiler v1.0.12 19:57:56
ℹ Starting the compiler 19:57:56
ℹ Configuration: { 19:57:57
"name": "StevenBlack",
"sources": [
{
"source": "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts",
"type": "hosts"
}
],
"transformations": [
"Compress"
]
}
ℹ Start compiling https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts 19:57:57
ℹ Original length is 99910 19:57:57
ℹ Length after applying transformations is 99910 19:57:57
ℹ The list was compressed from 99913 to 64051 19:58:18
ℹ Final length of the list is 64057 19:58:18
ℹ Writing output to /media/nas/git/adguard/stevenblack.adblock 19:58:18
ℹ Finished compiling
RAW-Version: https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock
Not even close to ideal,definitely needs imrovement(s) and @Zelo72 would advise you to contact the developer(s) and address those bugs before recommending it here or anywhere else for that matter: This is nonsense:
||localhost^
||localhost.localdomain^
||local^
||broadcasthost^
||ip6-localhost^
||ip6-loopback^
||ip6-localnet^
||ip6-mcastprefix^
||ip6-allnodes^
||ip6-allrouters^
||ip6-allhosts^
||0.0.0.0^
Only Exclamation point is permited in AdBlock lists to be used as a comment,the compiler left the Hashes almost everywhere,besides the top portion,instead of converting them:
! Source: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
!
# Title: StevenBlack/hosts
#
# This hosts file is a merged collection of hosts from reputable sources,
# with a dash of crowd sourcing via GitHub
#
Another Exmple:
# Contributions by:
# Kicelo, Dominik Schuermann.
# Further changes and contributors maintained in the commit history at
# https://github.com/AdAway/adaway.github.io/commits/master
#
# Contribute:
# Create an issue at https://github.com/AdAway/adaway.github.io/issues
#
# [163.com]
||analytics.163.com^
||crash.163.com^
The empty/blank lines also can be removed,it would shrink the file even more
Thanks, the host compiler seems to still have some bugs. Quick and dirty fix:
curl -s -L https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | grep -Ev '^127\.0\.0\.1|^255\.255\.255\.255|^::|^fe80::|^ff00::|^ff02::|0\.0\.0\.0 0\.0\.0\.0$' >/media/nas/tmp/work/sb.txt
hostlist-compiler -c /media/nas/git/adguard/resources/sb.json -o /media/nas/git/adguard/stevenblack.adblock
sed -i 's/^\#/\!/' /media/nas/git/adguard/stevenblack.adblock
sed -i '/^$/d' /media/nas/git/adguard/stevenblack.adblock
sed -i '1,9d' /media/nas/git/adguard/stevenblack.adblock #Remove hostcompiler comments
Not beautiful, but rare ;)
RAW-Version (fixed): https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock
Looks better,for sure 😉 👍
Shorter version:
curl -s https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | sed '/^127./,/.0.0$/d' >/media/nas/tmp/work/sb.txt
-L
is not needed,there are no redirects to the link,check with curl -v
for yourself 👍
Nice, thanks!
Thanks for the replies, guys. Now is this something I can do in Windows or am I going to have to install some kind of Linux Emulator or use the Google Cloud Console? I'd ideally like to have it run everyday or every or whatever and just save it to a folder where my uBlock could pick it up.
I know they have curl for Windows but I'm not familiar with the sed command.
You can use my unofficially generated Steven Black Adblock list, it will be updated automatically every night. I have included the generation in my autoupdate script for my other filter lists.
RAW: https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock
You can use my unofficially generated Steven Black Adblock list, it will be updated automatically every night. I have included the generation in my autoupdate script for my other filter lists.
RAW: https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock
Awesome thanks, but what happens when something happens and your system goes down or I need to get it from a different source? 😁
I'd love to have instructions on how to generate this myself. For example, here is curl for Windows. I found a sed for Windows, but it hasn't been updated in 11 years, which I guess isn't that big of a deal for a very simple utility like sed.
I don't know PowerShell, but I could make a simple batch file to perform the same actions. Although here is the same functionality explained in PS: Use PowerShell to Replace Text in Strings
Example command: get-content somefile.txt | where { $_ -match "expression"}
If anybody knows PowerShell and wants to help me figure this out, any help is welcomed!
Thanks again guys! I'll use @Zelo72's link for now, but I would like to figure out how to do it myself (unless the author of this list wants to provide it as an alternative to the HOSTS file, that would probably be the most ideal solution).
I'm trying to combine these into a source to block fake news outlets for an article I'm writing. There's also an anti-clickbait list, but I'd say there are some pretty subjective site choices on there. Pretty much anything is clickbait nowadays. But here's the list: https://assets.windscribe.com/custom_blocklists/clickbait.txt
That comes from my VPN provider, WindScribe. They also aggregate sites from this Wikipedia article but that would probably take some serious regex'ing to scrape that for just the domain names.
You can use Ubuntu Terminal on Windows 10.
You can use Ubuntu Terminal on Windows 10.
Thanks, I'll look into that.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Well, it looks like @Zelo72 deleted his account, so no more automatic blocklist from him. Anyone want to step up to the plate and host it?
Can I point out that having this hosts file converted into an AdBlock filter list, will only protect you from stuff rendered through the browser, any malware or other stuff that attempts to connect to sites using any mechanism other than a browser protected by adblock - will absolutely not be prevented unless you implement as hosts file that will kill name lookups.
Apologies if I've just necro'd this thread.
I understand where you're coming from. However, I don't like blocking things at the DNS/HOSTS level because that makes it inaccessible in the browser, even when it's a false positive.
Not to mention, I myself personally use a WIDE variety of anti-fingerprinting, anti-malware, anti-phishing tools, and so forth where I have listed on my page Anti Fingerprinting + Privacy & Security Browser Extensions
Sorry, I originally put the wrong URL in the above link, I have now corrected it
However, nonetheless, with maybe the some type of zero day exploit hitting your browser (which would be very widespread and well known and be patched very quickly), uBlock Origin, according to the author:
uBlock Origin (uBO) is not an "ad blocker", it is a wide-spectrum blocker, which happens to be able to function as a mere "ad blocker". But it can also be used in a manner similar to NoScript (to block scripts) and/or RequestPolicy (to block all 3rd-party servers by default), using a point-and-click user interface.
So, uBO is actually prohibiting the browser itself from accessing that URL before it can cause any harm or damage to the browser.
I also wanted to add this article. Although it's from 2014, it succulently lists my problems with using Hostman vs uBo. (Although the author is using ABP)
The best way to block ads: AdBlock Plus vs. a custom hosts file HostsMan)
I did do a search for converting hosts files into ABP files, but I only found the reverse - ABP into hosts files, unfortunately. 😔
pogue
Thanks, the host compiler seems to still have some bugs. Quick and dirty fix:
curl -s -L https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | grep -Ev '^127\.0\.0\.1|^255\.255\.255\.255|^::|^fe80::|^ff00::|^ff02::|0\.0\.0\.0 0\.0\.0\.0$' >/media/nas/tmp/work/sb.txt hostlist-compiler -c /media/nas/git/adguard/resources/sb.json -o /media/nas/git/adguard/stevenblack.adblock sed -i 's/^\#/\!/' /media/nas/git/adguard/stevenblack.adblock sed -i '/^$/d' /media/nas/git/adguard/stevenblack.adblock sed -i '1,9d' /media/nas/git/adguard/stevenblack.adblock #Remove hostcompiler comments
Not beautiful, but rare ;)
RAW-Version (fixed): https://raw.githubusercontent.com/Zelo72/adguard/main/stevenblack.adblock
Is there a Windows version of this? I can download curl for windows and I found sed for Windows sed for Windows - GnuWin32 - SourceForge
However, I haven't tested either of these to see if if they will work as the commands listed above.
Also, the Zelo72 file for the "fixed" version of the hosts file is 404.
Is there a Windows version of this? I can download curl for windows and I found sed for Windows sed for Windows - GnuWin32 - SourceForge
Cygwin (x64 only), possibly Git Bash, and WSL are your best options beyond conffing a Linux VM with access to your files.
^ Cygwin's problem will be updating it manually by re-running the installer. However, if you'll only use it for this, you shouldn't need to update.
Is there a Windows version of this? I can download curl for windows and I found sed for Windows sed for Windows - GnuWin32 - SourceForge
Cygwin (x64 only), possibly Git Bash, and WSL are your best options beyond conffing a Linux VM with access to your files.
^ Cygwin's problem will be updating it manually by re-running the installer. However, if you'll only use it for this, you shouldn't need to update.
Yeah, I've used Cygwin in the past but it's been ages. But it seems like the stuff I downloaded like curl and sed are written for Windows, so I'm not sure if I need Cygwin to do the commands listed above.
Hmm, true, if you replace grep
with an appropriate sed
one it should, although you'll need PowerShell so you don't have to replace the single quotes + caret (in Command Prompt ^
is an escape), and possibly worry about the pipe mangling the data.
Hmm, true, if you replace
grep
with an appropriatesed
one it should, although you'll need PowerShell so you don't have to replace the single quotes + caret (in Command Prompt^
is an escape), and possibly worry about the pipe mangling the data.
Unfortunately, I have no idea how to use PowerShell. Are you saying I should get grep instead of sed?
Unfortunately, I have no idea how to use PowerShell.
Luckily for this, you don't need to learn anything new.
Are you saying I should get grep instead of sed?
Dunno how you came to that conclusion when I said the complete opposite.
You could install grep
too, up to you; I was thinking that since you're on Window$ you likely wouldn't wanna install yet another app you'll only have a small use for, but then again you probly don't already know sed
, so it's just better to install grep
too to not have to learn new stuff.
Hi,
based on the current host file, I create a compressed Adblock and Unbound version of the list every day.
AdBlock: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/3p/stevenblack.adblock.txt
Unbound: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/3p/stevenblack.unbound.conf
Greetings, Gerd
Can someone confirm that the AdBlock list @hagezi just posted works for them? The format looks easy enough. As @pogue said, the more people that provide a solution the better, just for resiliency purposes, etc. I could easily add it to my projects, as well, as it looks like a natural fit.
Pre-generated Block Lists: https://scripttiger.github.io/alts/
Windows Batch File Conversion Scripts: https://github.com/ScriptTiger/Hosts-Conversions
Windows, Linux, and Mac Conversion Binaries Written in Go: https://github.com/ScriptTiger/Hosts-BL
@dnmTX, @pogue,
I just updated the website and Hosts-BL binaries with the new Adblock format. Please confirm everything works.
@StevenBlack, feel free to close this issue whenever you feel it's been satisfactorily addressed. If there are any further issues at this point, they can just be handled while the issue is closed, since these format requests are a bit out of scope anyway.
Pre-generated Block Lists: https://scripttiger.github.io/alts/
Windows, Linux, and Mac Conversion Binaries Written in Go: https://github.com/ScriptTiger/Hosts-BL/releases
If anyone needs the Windows batch file version, let me know. Otherwise, the batch files are kind of just in maintenance mode now while I focus on porting things to Go in my limited free time. After everything has been ported, I'll continue with the regular code reviews, as before, and update them with any missing functionality.
I don't use my HOSTS file for blocking content because I find it to hard to unblock or whitelist on the fly. I would love it if there was either:
Would either of these be possible without too much trouble?
Thanks very much in advance, pogue