Closed tgy closed 4 years ago
Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!
Hi @tgy that depends on a lot of things...
First how old are your computer, The hosts file size does matter here as all request would have to run through the list on every request.
Second try to install something like uBlock Origin
they have a nice logging tool that could help you in giving you answers to the number of queries (PS: F12 - The dev tool can also help here)
Alternatively Try installing something like Unbound and apply @ScriptTiger's lists to it. (Scripttiger converts this lists into nxdomains for unbound) and then restore your hosts
file
And I'm just guessing you are on some kind of Windows
platform.....
Have you tried accessing that site when not using a modified hosts file? Also, you said that it takes some seconds, is there any difference with using a non-modified hosts file?
List down the domains being accessed so they can help you.
I'm using a very recent macbook pro. It's definitely faster without the hosts file than with the hosts file. Two domains for example: leboncoin.fr, vinted.fr
I honestly don't know if you can install a DNS resolver like Unbound on a MAC, but why shouldn't you be able to, it is build on unix BSD......
But i would defiantly recommend you try as DNS resolvers are build to handle this kind of huge "zone" files the hosts isn't, that was designed to a few (10- 20) records on small offices...
In the mean time I'll visit the family for some good food...
You can also test my DNS servers :smiley: but I warn you, there will brake sites like FB :yum:
DNS server ip addresses:
IPv4:
95.216.209.53
116.203.32.67
IPv6:
2a01:4f9:c010:410e::53
2a01:4f8:1c0c:5f61::53
This sounds a lot like another case for compression. Please refer to my earlier comments for explanation and solution:
https://github.com/StevenBlack/hosts/issues/757#issuecomment-414102325
All of the people reporting this issue thus far have all been Windows users, so I am eager to hear if this works for you so we can confirm our first Mac-related incident.
@tgy Cc: @spirillen
it loads very slowly. I believe that in the background it's trying to access some resources that are hosted on blocked domain names and it waits for a while before giving up loading them.
I've encountered this quite a bit since this February and I am using a dedicated unbound Linux Server (4 core 3.2MHz/32GiB DDR3/SSD/1Gbps net and system usage is below 1% on queries and usually never spikes in System Monitor of that machine and usually sits around 0% (zero) ). When there's nothing matched in the list, like the organization I manage, it's quick to resolve the pages otherwise it's sometimes super slow depending on target site from the address bar... so I would assume that it is going through all the records in the db at least once on my managed domain? I've also tried refuse
instead of deny
at https://nlnetlabs.nl/documentation/unbound/unbound.conf/ with no differences. It is kind of an annoyance but I've learned to improve my patience level atm (although may not continue that way forever; looking into alternatives still). Also no DoH enabled in Firefox (Fx), PaleMoon (PM) i.e. older browsers that don't support DoH... and haven't had a chance to go through Chromium/Chrome's current flags yet. Do have DoT enabled in unbound and also on router. Was using 1.1.1.1 but then tried 9.9.9.9 about 2 weeks ago and then now randomize between them as of Friday... no change in the "slowness to load" on some sites. Not currently running DNSSec since it hosed my network on my server last week (probably some configuration issue that I'm not familiar with yet).
Refs:
... also been having some delay lag on sourceforge esp. when editing my projects with allura wiki. Allura wiki usually shows markdown content and then actually parses it in the editor so it looks more like it will be when previewed. Without unbound and only using 1.1.1.1 from router it's immediate rendering.
Next test period is bumping up num-threads
to non-1
even though I'm the only one on this particular LAN.
$ cat unbound_srv.conf
server:
ip-freebind: yes
do-daemonize: no
verbosity: 1
do-ip4: yes
do-ip6: no
do-tcp: yes
do-udp: yes
interface:192.168.0.126
interface:127.0.0.1
num-threads: 1
outgoing-port-permit: 32768-60999
outgoing-port-avoid: 0-32767
log-time-ascii: yes
access-control: 127.0.0.1/32 allow_snoop
access-control: 127.0.0.0/8 allow
access-control: 192.168.0.0/24 allow
hide-identity: yes
hide-version: yes
minimal-responses: yes
rrset-roundrobin: yes
ssl-upstream: yes
$ cat unbound_ext.conf
forward-zone:
name: "."
forward-ssl-upstream: yes
## Cloudflare DNS
forward-addr: 1.1.1.1@853#one.one.one.one
forward-addr: 1.0.0.1@853#one.one.one.one
## Also add IBM IPv6 Quad9 over TLS
forward-addr: 9.9.9.9@853#dns.quad9.net
forward-addr: 149.112.112.112@853#dns.quad9.net
$ apt-cache madison unbound
unbound | 1.6.7-1ubuntu2.2 | http://us.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages # <-- Current
unbound | 1.6.7-1ubuntu2.1 | http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages
unbound | 1.6.7-1ubuntu2 | http://us.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages
I honestly don't know if you can install a DNS resolver like Unbound on a MAC
Homebrew has an option described at https://sizeof.cat/post/unbound-on-macos/ but I started off with stubby initially before I just moved it all to a dedicated unbound Linux server... easier to just point all DNS to that server IP imho. Manually import the rules in currently during all tests and releases via ads.conf
.
rum-static.pingdom.net...
tag.aticdn.net...
static.criteo.net...
www.googletagmanager.com...
You are right that the site is slow, but there are more than one reason to that; and yes, this list is a reason why, and it should be so.
As you can see in the attachment the site is constructed to collect data, not serve data, this means they are first, second and third trying to collect a lot of date about you, and then appear to be doing something you want.
Translated into short. You are waiting for there failures to happens before you see the page. (faulty written site)
So everything is working as expected actually :) it's there site that's broken not your setup :yum:
PS: since you have dedicated an entire piece of hardware to a DNS resolver, I would recommend a different combo of selection
BSD/Recursor (which have full support of RPZ import) Debian(Buster)/Recursor
These are not faster than unbound, but the fully operational RPZ makes a different in choice from my point of view
PPS: @martiitry local-zone: "google.com" always_nxdomain
in you unbound zone, that will (if following RFC) remove all clients timeouts to a blocked zone
Sleep tight :zzz:
Wow. Give this person, AKA @spirillen, a silver star ⭐️!
Doesn't appear that Recursor does caching though. That's the primary function when I added unbound. Has to support that or at least work in tandem.
That also doesn't explain the SF and other sites. I wasn't accumulating the URLs however it's very noticeable when I go searching for an answer or spec.
Whoops missed an edit... let me reread that again.
PPS: @martii try
local-zone: "google.com" always_nxdomain
in you unbound zone, that will (if following RFC) remove all clients timeouts to a blocked zone
Kills https://www.youtube.com/ when rule is placed before forward-zone
. :\ Thanks for the suggestion though.
Touched and edited a new try.conf
in /etc/unbound/unbound.conf.d/
and seems to unblock the youtube.com failure. Will get back to you if things are "more normal" after a few days since I'm going to be super busy this week. EDIT Tried something different and now google.com doesn't come through. So back to original issue of YouTube being killed. :\ Oy caching false positive is my guess.
Should also clarify that:
https://www.vinted.fr/ loads immediately in Fx and Chromium.
... means that I get something on the web site... however the spinners in the browsers goes for a while i.e. does have some lagging (Same with SourceForge).
Hi @martii and other interested readers :jack_o_lantern:
About your missing caching, I have seen some articles/issues around the net on that for older Unbound releases like yours.
Current release is 1.9.3
and on more resent Ubuntu (Disco) the apt release is 1.9.0-2
which is the one i use for testing my zone files. I know Unbound is working on the RPZ protocol probably for the 1.9.4
version, but I'm not sure on that, as it is a open question
There are also another interesting issue on unbound with DNSsec, DNS over TLS (DoT) not serving cross signed zones out of the box.
And as commented by @gthess here
This seems like a problem with the domain and not unbound (or stubby for the relevant issue). Unbound saw an unsigned record in a supposedly signed domain and rightfully fails validation.
Unbound and DNSSEC were doing their job in regards to this mis-configured domain; you are supposed to get a SERVFAIL when validation fails.
This is ware the NTA comes in handy, but should never be used for other than ### ONLY TRUSTED zones as the concept is to prevent serving hijacked stub-zones and should actually only be used on own domains not any foreigners zones which you do not have any control.
You are also welcome to take a peek at my unbound.conf which is caching as expected
a comment on:
Kills
https://www.youtube.com/
when rule is placed before forward-zone. :\ Thanks for the suggestion though.
Ofcurse :imp: Local-zones should only be applied for matching domains and subdomains. the given example was an example but should of curse have been clarified better by usage of example.net
domain. (You do know youtube
is a bait-site right?)
Another little comment on previously replies:
The big different between a hosts
-file and a DNS recursor is that the hosts
-file is scraped through on disk level at every request; the DNS recursor loads the data into memory at startup and then scrape the memory for every request done to various rules, which is why DNS recursor can respond to zone files as big as ~2mill in less than 1msec
I'll put up a test of this with results a bit later and ad a note here :spiral_notepad: when it's served and ready :smiling_imp:
A little head up.. test still running....
:coffee:
@spirillen thanks for your analysis. This is exactly what I expected was happening: failures from third party data collection before showing actual interesting content. This is exactly one of the motivations I had for setting up this hosts file: preventing shit third parties from collecting my data. Is there a way to make these fail faster?
Is there a way to make these fail faster?
Find a site that act normally, who are not all about collecting data about you, but about providing contents.
The bait sites are spending more and more resources on how to working you physic to disable tracking and banner protection to get there sites load faster, as the build in timers before failure.
And in near future we would unfortunately probably also see the build-in DNS setups. Just try to view this thread #1051
[#pol] It have unfortunately become common that people sticks to one site and over time letting then bt fk them. If peoples started to turn over to sites (domains) that don't act like &/¤&%¤ then companies like FB, YT, G and ${0} would simply die. But they are nothing but malicious bait sites breaking every rules and netiquette and for reason beyond my imagination peoples keeps visiting them......
So yes, the solution, find another site....
Is there a way to make these fail faster?
Find a site that act normally
Unfortunately that's not always an option. Occasionally I'll do some online perusing/shopping, take https://www.microcenter.com/ for example, and it just takes forever (not nearly as long as https://www.leboncoin.fr/ though). While I'm aware they are attempting to track even with unbound it should be a lot faster than this. Still have to look at the config @spirillen posted to see and changing from bionic to non-LTS isn't going to happen (prefer a stable machine for development). For the moment I guess I'll just have to be more patient. :pouting_cat:
... and netiquette ...
Speaking of... took a while for this to come up.
Yes, I agree with you in theory @spirillen but in practice it's not always possible to avoid some websites.
Hi @Martii
changing from bionic to non-LTS isn't going to happen (prefer a stable machine for development).
I would then recommend you to compile the unbound from source your self :) there have been made a lot of improvements
@tgy like which ones? I have no issue blocking site like google, youtube, fb etc entirely, and if a site then don't work... they have nothing to offer me :tongue:
But ok I'm also a .45
@spirillen
... have been clarified better by usage of
example.net
domain.
and
...
always_nxdomain
...
Took me a couple of very distracted days to figure out what you meant on this part by putting all the pieces together... when I alter the script to not use deny
/refuse
(plus reread the readme.md since it changed since Feburary) and use always_nxdomain
seems to be much more tolerable (https://www.leboncoin.fr/ is somewhat slow but not as much... same as SourceForge YAY!!!! :).
... compile the unbound from source your self ...
Thought of that however that would move that server out of the stable zone. I'd rather have Ubuntu back-port it on a more frequent stable release timetable... or at the very least unbound could do a PPA that's release status only. When compiling from source dependency "hell" is something I prefer to avoid on that particular server. Development is on this machine, production is on that dedicated server plus production is "time shared" i.e. when I'm not perusing the web it does actually serve a secondary purpose for local network jobs. Waste not, want not.
So wanted to clarify that this unbound version is not totally to blame. :smile_cat: Still more work/configuration to do though.
My apologies for not make it more clear about the diff between the usage of end syntax.
So for other readers, let's make a little simplified "try" to correct this.
The following syntax have the follow response/actions
local-zone: "example.com" always_nxdomain # This replies to client that this domain does not exist, do not wait longer for a reply
local-zone: "example.com" static # replies with a empty A record
local-zone: "example.com" drop # simply just drop the request and forgets you ever asked anything, the client keeps waiting for a reply that will never comes, timeouts...
local-zone: "example.com" deny # denying the given hosts to make any requests to this configuration (you can run several instances and sub-zones on same machine)
local-zone: "example.com" refuse # stops queries too, but sends a DNS rcode REFUSED error message back, and the client might ask another DNS resolver.
From man:
local-zone: "example.com" ${0}
deny Do not send an answer, drop the query. If there is a match
from local data, the query is answered.
refuse
Send an error message reply, with rcode REFUSED. If there is
a match from local data, the query is answered.
static
If there is a match from local data, the query is answered.
Otherwise, the query is answered with nodata or nxdomain.
For a negative answer a SOA is included in the answer if
present as local-data for the zone apex domain.
transparent
If there is a match from local data, the query is answered.
Otherwise if the query has a different name, the query is
resolved normally. If the query is for a name given in
localdata but no such type of data is given in localdata,
then a noerror nodata answer is returned. If no local-zone
is given local-data causes a transparent zone to be created
by default.
typetransparent
If there is a match from local data, the query is answered.
If the query is for a different name, or for the same name
but for a different type, the query is resolved normally.
So, similar to transparent but types that are not listed in
local data are resolved normally, so if an A record is in the
local data that does not cause a nodata reply for AAAA
queries.
redirect
The query is answered from the local data for the zone name.
There may be no local data beneath the zone name. This
answers queries for the zone, and all subdomains of the zone
with the local data for the zone. It can be used to redirect
a domain to return a different address record to the end
user, with local-zone: "example.com." redirect and
local-data: "example.com. A 127.0.0.1" queries for www.exam-
ple.com and www.foo.example.com are redirected, so that users
with web browsers cannot access sites with suffix exam-
ple.com.
inform
The query is answered normally, same as transparent. The
client IP address (@portnumber) is printed to the logfile.
The log message is: timestamp, unbound-pid, info: zonename
inform IP@port queryname type class. This option can be used
for normal resolution, but machines looking up infected names
are logged, eg. to run antivirus on them.
inform_deny
The query is dropped, like 'deny', and logged, like 'inform'.
Ie. find infected machines without answering the queries.
inform_redirect
The query is redirected, like 'redirect', and logged, like
'inform'. Ie. answer queries with fixed data and also log
the machines that ask.
always_transparent
Like transparent, but ignores local data and resolves nor-
mally.
always_refuse
Like refuse, but ignores local data and refuses the query.
always_nxdomain
Like static, but ignores local data and returns nxdomain for
the query.
noview
Breaks out of that view and moves towards the global local
zones for answer to the query. If the view first is no,
it'll resolve normally. If view first is enabled, it'll
break perform that step and check the global answers. For
when the view has view specific overrides but some zone has
to be answered from global local zone contents.
nodefault
Used to turn off default contents for AS112 zones. The other
types also turn off default contents for the zone. The 'node-
fault' option has no other effect than turning off default
contents for the given zone. Use nodefault if you use
exactly that zone, if you want to use a subzone, use trans-
parent.
So what to choose, In my mind it obviously that you should choose (always_nxdomain|static) as the prefered, do to the fact that any other replies can lead the client to go ask elsewhere. But if the client requesting is given the "Domain do not exist" it should stop waiting and go to next step and not take any other actions to lookup for the domain. <- supposed workflow
But keep in mind, this is the 3rd world war. and a separate ip firewall (timeouts) would be next step to protect your self against these commercial attacks
@Martii if you are up for a bit deeper learning and way to get newer recursor, I could recommend the powerdns recursor repo and mix it with the dnsdist, now you'll have some real powerfull control over the queries. But that requires some seriously learning and time to get this mix right... but I run it on my old Lenovo T-520 laptop (~10 yo) with less than 150mb consumption....
Boy did this thread take a new direction :raised_hands:
To address the OP's original post, @tgy, if you're following all this and want to host your own Unbound, etc., can you confirm this as a possible solution? I don't believe there is an official Mac version for Unbound, but I think there are some home-brewed versions floating around. You could also host it on a separate machine running Windows/Linux/BSD attached to the same LAN as your Mac and just point to that machine as your DNS.
On the other hand, rather than focusing entirely on Unbound, I would also like to reiterate my earlier comment of this possibly being a compression issue, which is also fairly common and I think worth testing on your machine to see if you notice any performance improvement just by using a different hosts file format.
https://github.com/StevenBlack/hosts/issues/757#issuecomment-414102325
I've made a little wiki here it should be ready for a test....
PS: make comments in new thread
Thanks @ScriptTiger & @spirillen. I don't have a lot of time to go through configuring all of this right now but I'll give it a shot later and keep you guys posted.
Closing this, now.
Hi @Martii
Just re-reading this thread, as I'm still preparing the test data, I promised earlier, and noticed you lines
Doesn't appear that Recursor does caching though. That's the primary function when I added unbound. Has to support that or at least work in tandem
I can assure you PowerDNS's recursor does cache...
Packet cache hitrate: 100.00%, Average response time: 0.001 ms, CPU Usage: 0.50%
I would have attached a screen-dump but GH isn't in the mood for that
But a conf you might have forgotten could be:
root-nx-trust=yes
And you should remember the powerdns recursor is designed to be running behind dnsdist, which is the primary load-balancing and cache front-end
As I promised in this comment
I have now setup a test environment to demonstrate the different between the usage of hosts and a DNS recursor like Unbound
wc -l output/domains/ACTIVE/list
1.789.872 output/domains/ACTIVE/list
time dig +noall @127.0.0.1 -p 53 -f output/domains/ACTIVE/list
time
is a Unix tool to measure the time taken for a command to completedig
The best tool to test DNS. It's part of the bind-tools+noall
Set or clear all display flags@
Which DNS-server to use for the test @127.0.0.1
therefore means localhostp
Which port to forward the query toof
input file which contains domains to testAll data is setup as always_nxdomain
local-zone: "example.org" always_nxdomain
Test stat before first run
unbound-control stats | grep total
total.num.queries=0
total.num.queries_ip_ratelimited=0
total.num.cachehits=0
total.num.cachemiss=0
total.num.prefetch=0
total.num.zero_ttl=0
total.num.recursivereplies=0
total.requestlist.avg=0
total.requestlist.max=0
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.000000
total.recursion.time.median=0
total.tcpusage=0
real 4m31,098s
user 3m0,287s
sys 2m10,670s
total.num.queries=1789872
total.num.queries_ip_ratelimited=0
total.num.cachehits=1789872
total.num.cachemiss=0
total.num.prefetch=0
total.num.zero_ttl=0
total.num.recursivereplies=0
total.requestlist.avg=0
total.requestlist.max=0
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.000000
total.recursion.time.median=0
total.tcpusage=0
Notice the total.num.queries=1789872
and total.num.cachehits=1789872
they are equal
real 4m38,948s
user 3m6,641s
sys 2m14,106s
total.num.queries=3579744
total.num.queries_ip_ratelimited=0
total.num.cachehits=3579744
total.num.cachemiss=0
total.num.prefetch=0
total.num.zero_ttl=0
total.num.recursivereplies=0
total.requestlist.avg=0
total.requestlist.max=0
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.000000
total.recursion.time.median=0
total.tcpusage=0
Again the total.num.queries=3579744
and total.num.cachehits=3579744
they are equal
Thats good :+1:
In this test we will use dig to lookup an external domain which isn't in our blocklist.
First dig is a lookup of www.mypdns.org
time dig +noall @127.0.0.1 -p 53 www.mypdns.org
real 0m1,681s
user 0m0,016s
sys 0m0,004s
Second run
time dig +noall @127.0.0.1 -p 53 www.mypdns.org
real 0m0,025s
user 0m0,016s
sys 0m0,009s
Now let's get the cache stats
total.num.queries=3579746
total.num.cachehits=3579745
This time the queries is +1 to cachehits :smiling_imp:
Let's do the same, where the records is added to the /etc/hosts
files and the local DNS is disabled
cat /etc/resolv.conf
nameserver 213.133.99.99
nameserver 213.133.100.100
nameserver 213.133.98.98
time dig +noall www.mypdns.org
real 0m1,031s
user 0m0,021s
sys 0m0,008s
time dig +noall www.mypdns.org
real 0m0,026s
user 0m0,015s
sys 0m0,012s
time while read line; do getent ahosts $line; done < output/domains/ACTIVE/list
real 9411m8,908s
user 8732m51,608s
sys 675m54,859s
Do to the time consumed by this test, there won't be a sec
As notices later in this thraed, there is in fact an issue using dig to test hosts files, therefore I'm starting a third test of unbound with the same test string, as with the hosts file.
time while read line; do getent ahosts $line; done < output/domains/ACTIVE/list
real 98m8,897s
user 48m50,247s
sys 49m41,709s
time wget --no-config --spider -4 --delete-after -i output/domains/ACTIVE/list
Test result:
real 7m5,683s
user 1m58,984s
sys 2m43,794s
------------------------
------------------------
real 6m53,000s
user 1m58,163s
sys 2m42,103s
As this "quick" dirty test shows, there are several god reasons to consider switching to a DNS Resolver like Unbound on windows and Apple.
I've crome across a site that stated there should be a prebuild of unbound. is should be posible to install it by brew install unbound
@spirillen how did you get dig
to look at the host file? It's ignored - at least with my setup on Debian.
The name is in /etc/hosts:
$ grep nametest /etc/hosts
10.10.10.10 nametest.home.net nametest
and it does resolve properly:
$ ping nametest.home.net
PING nametest.home.net (10.10.10.10) 56(84) bytes of data.
^C
--- nametest.home.net ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2034ms
but dig
doesn't return the hostfile address:
$ dig @127.0.0.1 nametest.home.net
; <<>> DiG 9.11.5-P4-5~bpo9+1-Debian <<>> @127.0.0.1 nametest.home.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 32302
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1432
; COOKIE: acc19344f14df5667c6fbc815dc2e1d8b99b9a6ddc305752 (good)
;; QUESTION SECTION:
;nametest.home.net. IN A
;; AUTHORITY SECTION:
home.net. 3600 IN SOA localhost. admin.home.net. 2017010100 21600 15 86400 3600
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 06 10:08:08 EST 2019
;; MSG SIZE rcvd: 125
Hi @ler762
You set the proper order in /etc/host.conf
# The "order" line is only used by old versions of the C library.
order hosts,bind
multi on
Or in /etc/nsswitch.conf
line starting with: hosts
hosts: files mdns4_minimal [NOTFOUND=return] dns
But it will vary depending on your setup, and Linux is very open for customization :), The best guide I can provide you, as standard choices.
The second best choice is a duckduckgo.com search
Maybe this could be your answer https://askubuntu.com/questions/627906/why-is-my-etc-hosts-file-not-queried-when-nslookup-tries-to-resolve-an-address
PS: The the right terminology is "Network search order"
@spirillen I have it set in /etc/nsswitch.conf
(along with multicast dns resolution disabled):
$ grep 'hosts:' /etc/nsswitch.conf
# hosts: files mdns4_minimal [NOTFOUND=return] dns
hosts: files dns
If you think about it, dig
would be, at best, misleading if it got addresses from the local /etc/hosts
instead of the dns server being queried but maybe unbound
does look there?
On your machine running unbound
, could you try adding
10.10.10.10 nametest.home.net nametest
to your /etc/hosts
and show the results from
ping nametest.home.net
dig @127.0.0.1 nametest.home.net
Thanks!
Hmm you've caught me there... switching test to use getent ahosts
Thanks for the wake-up call :smiley: :hurtrealbad:
Man for getent ahosts
When no key is provided, use sethostent(3), gethostent(3), and endhostent(3) to enumerate the hosts database. This is identical to using hosts. When one or more key arguments are provided, pass each key in succession to getaddrinfo(3) with the address family AF_UN‐SPEC, enumerating each socket address structure returned.
It's of curse only ping
and not dig
or hosts
that looks up in the /etc/hosts
file. They are both going straight for the /etc/resolve.conf
Taking my head out of a.. it's not a hat :tophat: - and turning off the autopilot, Sorry guys
@spirillen I'm not sure what you're trying to measure..
If it's just how much time an absurdly oversized hosts file costs (1.8 million lines!??) then why not something like
time getent hosts www.google.com. # get it in cache
time getent hosts www.google.com. # get from cache
with and without the monster host file?
nb: I'm assuming the resolver will check the hosts file first and only after not finding the answer talks to the dns server.. I'm not sure how to prove that's what actually happens since I do seem to have a resolver cache running - even tho I don't see anything like nscd
on my machine. hrmmm...
https://unix.stackexchange.com/questions/387292/how-to-flush-the-dns-cache-in-debian
$ sudo systemd-resolve --flush-caches
[sudo] password for lee:
$ time getent hosts www.google.com.
172.217.9.196 www.google.com
real 0m0.025s
user 0m0.000s
sys 0m0.008s
$ time getent hosts www.google.com.
172.217.9.196 www.google.com
real 0m0.009s
user 0m0.000s
sys 0m0.004s
apparently dig
bypasses the cache and getent
doesn't.
& semi-related: is your monster host file available for download somewhere? I'm curious how privoxy
would handle an action file that large.
Q: If it's just how much time an absurdly oversized hosts file costs (1.8 million lines!??) then why not something like
A: It's a measurement of the differences between using hosts-file vs eg. Unbound for blocking. To proof the beneficial of running a local resolver for blocking over /etc/hosts
.
I doubt nscd
caches anything from your hosts file as that for my knowledge wouldn't be a default behavior as the hosts file is a "override" of the default "Network lookup", but things do evolve from time to time :smile:
Versus what you shows in your example you demonstrate the caching of an external lookup (non-blocked), for something within not in a hosts file, but I'm surely curious if nscd
actually would cache a /etc/hosts
record, please test.
This test is performed in relation to this comment and the follow thread down :smile:
even tho I don't see anything like nscd on my machine. hrmmm... Hmm from deep down in a very dark corner of my head, do I find a link to dnsmasq there?? which by the way, by design works with
/etc/hosts
in some obscure wayssemi-related: is your monster host file available for download somewhere? I'm curious how privoxy would handle an action file that large
Look for a commit != e613a335 at https://gitlab.com/spirillen/world-dumbest-ultimate-hosts-blacklist and you will know I've done uploading it :+1: However you'll find two commits,
But why are you using privoxy on a Debian? just curious....
To proof the beneficial of running a local resolver for blocking over /etc/hosts.
Ahh - I get it now. (although the only proof I need is how hard it is troubleshooting /etc/hosts vs. pretty much anything else)
Look for a commit != e613a335 at https://gitlab.com/spirillen/world-dumbest-ultimate-hosts-blacklist
Which firefox chokes on :(
Gah. Your tab just crashed.
Oh well.. easy enough to grab with curl - thanks!
But why are you using privoxy on a Debian? just curious....
Privoxy is easy to troubleshoot, I'm the only one using it at home so I can go overboard and do things like
{ +block{TLDs I probably don't want} }
.ad/
.adult/
.ae/
.auction/
.bid/
... etc ...
without worrying about breaking stuff for anyone else & it's trivially easy to unblock specific sites.
And it's fast.
I had to add a line at the start of your UltimateWorldBiggestDumbestHosts.hosts
:
$ head -3 monsterHostFile.action
{ +block{monsterHostFile} }
--little--princess--.tumblr.com
-allporn-.tumblr.com
and the darn thing takes a while to read in:
2019-11-06 18:51:25.700 00000acc Info: Loading actions file: .\lightswitch-hosts.action
2019-11-06 18:51:25.825 00000acc Info: Loading actions file: .\unified-hosts.action
2019-11-06 18:51:25.888 00000acc Info: Loading actions file: .\monsterHostFile.action
2019-11-06 18:51:32.105 00000acc Info: Loading actions file: .\unblock.action
but page load speed using Privoxy with/without the monsterHostFile.action is less than a second.
Without monsterHostFile.action
$ time curl --head --proxy 127.0.0.1:8118 https://github.com/StevenBlack/hosts/issues/1057
HTTP/1.1 200 Connection established
HTTP/1.1 200 OK
Server: GitHub.com
Date: Thu, 07 Nov 2019 00:08:47 GMT
Content-Type: text/html; charset=utf-8
Status: 200 OK
... snip ...
real 0m0.933s
user 0m0.031s
sys 0m0.015s
with:
$ time curl --head --proxy 127.0.0.1:8118 https://github.com/StevenBlack/hosts/issues/1057
HTTP/1.1 200 Connection established
HTTP/1.1 200 OK
Server: GitHub.com
Date: Thu, 07 Nov 2019 00:10:56 GMT
Content-Type: text/html; charset=utf-8
Status: 200 OK
... snip ...
real 0m0.975s
user 0m0.031s
sys 0m0.015s
It would be more interesting if you tested against a number of blocked vs nonblocked sites...
What is privoxy's memory and CPU footprint with(out) monsterHostFile.action? while loading and surfing?
Blocking is fast, memory usage with a 3.8M line file is, not surprisingly, large:
without with
Working Set 24.2Mb 530Mb
$ wc -l monsterHostFile.action
3788979 monsterHostFile.action
$ tail -3 monsterHostFile.action
zzzzzz.com
zzzzzzzhotel.fr
zzzzzzzzz.info
Early on I block .fr/ and .info/, so the last two should be blocked faster.
$ time curl --head --proxy 127.0.0.1:8118 https://zzzzzzzhotel.fr
HTTP/1.1 403 Request blocked by Privoxy
.. snip ..
real 0m0.188s
user 0m0.000s
sys 0m0.030s
$ time curl --head --proxy 127.0.0.1:8118 https://zzzzzzzzz.info
HTTP/1.1 403 Request blocked by Privoxy
.. snip ..
real 0m0.186s
user 0m0.000s
sys 0m0.030s
$ time curl --head --proxy 127.0.0.1:8118 https://zzzzzz.com
HTTP/1.1 403 Request blocked by Privoxy
.. snip ..
real 0m0.203s
user 0m0.000s
sys 0m0.000s
^shrug^ not much difference between blocking before the 3.8M line file or at the end. But even if there was, I'd still use privoxy. Figuring out what needs to be removed from a hosts file to un-break a site is something I don't ever want to do again.
CPU is very low - after checking if the site is allowed or blocked it's all I/O - read from the server (web site) & write to the client (browser)
Why not give it a try yourself?
https://www.privoxy.org/
Because I have tried it :stuck_out_tongue_winking_eye: and found it to heavy and slow :smiling_imp:
If you enable it for serving request from your local lan it often breaks (in the past when i tested it) and a memory footprint of 500 mb is way to much for any running service.
Next, as you will see in this and other threads I'm deeply into RPZ with NXDOMAIN responses :smile: to be able to protect any network attached devices, and breaking sites, mis-services because they think they can spy on there users, is the best damn thing I can think of. Cos users would(should) move away from such sites as they get warned about these suckers :see_no_evil: :hear_no_evil: :speak_no_evil:
@spirillen Adding your UltimateWorldBiggestDumbestHosts to my windows hosts file stopped name resolution & was a pain to undo. So I tried today with a smaller set of host names -- StevenBlack + lightswitch05
short story - The initial hit was over 3.5 minutes for windows to process the host file & one cpu was maxed out while the dnscache service working set slowly climbed to 128.6M
I don't know how often (or even if) Windows clears the cache, but every time it's cleared name resolution stops while the hosts file is processed: ($ prompt is from the cygwin shell - I have no idea how to time a command from the cmd shell)
C:\Windows\System32\drivers\etc>grep 1bg.net hosts
0.0.0.0 1bg.net
$ time curl --head https://1bg.net
curl: (6) Could not resolve host: 1bg.net
real 0m0.083s
user 0m0.000s
sys 0m0.046s
C:\Windows\System32\drivers\etc>ipconfig /flushdns
Windows IP Configuration
Successfully flushed the DNS Resolver Cache.
$ time curl --head https://1bg.net
curl: (6) Could not resolve host: 1bg.net
real 3m37.604s
user 0m0.000s
sys 0m0.015s
$ time curl --head https://1bg.net
curl: (6) Could not resolve host: 1bg.net
real 0m0.087s
user 0m0.015s
sys 0m0.015s
re privoxy:
If you enable it for serving request from your local lan it often breaks (in the past when i tested it)
can we take that off-line? I'd like to know what broke & if it still happens.
and a memory footprint of 500 mb is way to much for any running service.
that's with using your UltimateWorldBiggestDumbestHosts as a privoxy action file. Privoxy is using 18.5M now & that's with all the pre-Snowden http:// filters & blockers that I haven't bothered to remove.
I'm deeply into RPZ with NXDOMAIN responses ... and breaking sites, mis-services because they think they can spy on there users, is the best damn thing I can think of.
Whatever works for you. People that are still using a hosts file on the other hand...
FWIW: I don't mind breaking sites, but if my wife can't get to things like family pics on facebook she'll just turn off wireless on her phone rather than tell me there's a problem :( She's going to be tracked & spied on outside the house anyway, so unless/until I can set up a VPN service at home for our phones, generating an RPZ zone from host files on the net (eg. StevenBlack) is, for me, more pain than gain.
Wouldn't the best privacy settings in this case be to delete facebook? My life function pretty well without that crap :yum: Last time I chekket flopbook is all about storing and tracking to get trump re-elected..
Wouldn't the best privacy settings in this case be to delete facebook?
Obviously. But it's no small task convincing all the relatives to delete facebook.
Wouldn't the best privacy settings in this case be to delete facebook? Obviously. But it's no small task convincing all the relatives to delete facebook.
unfortunately :sob: you're right
The promised test is finished and the virtual machine will be deleted.
If you come up with other idea for test environments, I'm open for suggestions, but as said, this environment is deleted, and new tests will there for be late and put to the bottom of the ToDo list
Thanks for this repo.
I have one problem and I don't know if you guys have the same and if there's a solution.
Sometimes when I go to some website (e.g. leboncoin.fr), it loads very slowly. I believe that in the background it's trying to access some resources that are hosted on blocked domain names and it waits for a while before giving up loading them. Eventually, the websites will always load (so far) but it's a bit annoying because it takes seconds before the page actually shows up.
Is there anything I can do to improve this? Like some parameter tuning in my web browser (I use qutebrowser).