StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
25.71k stars 2.15k forks source link

Delay in loading web pages #411

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hi, so I tried it for 3 months, with and without, and every time I have put the 39K+ lines in my host file, I can't navigate normally, it's like every 5min I have this "connection freeze" where I have to wait for 30 to 60sec before the pages load.

I tried with lots of tab (many different web site) and it's the same, they all stop working and they are all loading the page, then all of at the same time (30 to 60sec later) they all load quickly.

I have a good computer (i7, 32GBram, SSD), I have a optical fiber internet; I have tried this on a windows 7 vanilla. even on VM it does the same.

without the big list in my host file, everything goes full speed, never lag, never freeze. I don't know what is the issue

ScriptTiger commented 6 years ago

What browser are you using? Are you using a web proxy that you have configured? Have you experienced any such "lagging" outside of just your browser? Judging from your system, you're probably a gamer, so do you experience lag in your multiplayer games, as well? I am more inclined to believe it has more to do with your browser client or perhaps a web proxy you may be connected to than it does with the size of the hosts file, as you are implying, as tiny Androids are running it fine with a fraction of your resources. If your pages are starting off slow at first and jetting forward later, it most likely has to do with the order the page loads certain elements. Analytics and ad scripts loaded in the header will slow you down on the page load for as long as your browser insists on continuing to try connecting to the black hole address (127.0.0.1/0.0.0.0). Then once it finally gives up, it loads everything else. However, you can still run into speed bumps again if there is more black holed content throughout the body of the page. All of those issues will also apply if you are connected to a non-caching web proxy, or perhaps even a caching one depending on the configuration, that you have configured with the hosts file. Have you modified your list in any way? What address are you using as your black hole?

I am completely agnostic when it comes to brands and I keep an up-to-date version of all of the major browsers, and some not so major, just to get the best possible handle on user experience. I would personally recommend Chrome just because it supports the most HTML5 features and actually ships with HTML imports, among other HTML5 components other browsers don't include. I know many people have their own reasons for liking whatever browser they do, but the fact of the matter is that HTML imports are just simply more efficient than the Java web components as it doesn't require any extra script loads and is completely native to HTML. So until someone else actually starts to read the W3C standards, Chrome is generally going to be the most efficient choice in most circumstances.

ghost commented 6 years ago

like I said, I tried with a non-VM Vanilla Windows 7 Updated Original. my browser : Firefox. (the same thing happened with Chrome) I don't use proxy. No effect on Online gaming. cmd.exe ping : freezes. with all the test I did, doing a constant ping seems to give me less Freeze. (cmd : ping -t 8.8.8.8) for example.

as for "black holes" I have no idea what this is, or what the term means (english isn't my mothertongue). my computer use a normal router, 192.168.0.1 address, no firewall inside.

I'll do some more test, but I was hoping someone else had the same issue lol. thx for your comment.

ScriptTiger commented 6 years ago

You are just using a standard hosts file you downloaded here without any edits? Just downloaded and copy+paste? Did you copy and paste manually or use a utility, like Hostsman?

Here is an example line from the hosts file: 0.0.0.0 lb.usemaxserver.de

"0.0.0.0" is the IP address that the domain name "lb.usemaxserver.de" resolves to. In the case of blacklisting/blocking domain names, the real IP address is replaced by a "black hole" address that does not take you to the real domain name. It is commonly referred to as a black hole because once a course is charted for a blacklisted website, it's traffic cannot escape an inevitable void of nothing.

kronflux commented 6 years ago

I will note that when I first started using hosts block files such as MVPS years ago on Windows 7, I had an issue where when I first booted up Windows, I had to wait a good 5-10 minutes before my network connection would actually resolve web pages. Eventually I updated my network driver, and that solved the problem for me. Never experienced the issue again, even after clean installs of Windows 8 and Windows 10.

So you might want to check for a driver update for your network adapter.

Also, when doing your actual testing, make sure to flush the DNS cache each time you change the HOSTS file.

ghost commented 6 years ago

@ScriptTiger I use Notepad++ (I do some coding) and often used to edit host file to add ip I want to redirect (127.0.0.1 for local php etc) and annoying software that always want to go on the net (I got their ip via my firewall, then blocked them via host file)

I did a copy of the content of the file to the host file via notepad++, so no weird character were inserted. and I already had the issues. later I added my personal blacklist (but not my ip/domain for local php, I now use VMs for that, so nothing interfere with my computer files) [(ok for the term black hole, I didn't they were called that)]

@kronflux I guess I'll try with windows 10... (I also wonder if it was my drivers, so I updated them, but no changes..) ; and yes, I do a flushdns often when testing.

ScriptTiger commented 6 years ago

As stated on the readme for this repo, it's better to stick with 0.0.0.0 as your redirected IP because your computer automatically knows it's not valid. 127.0.0.1 is actually a real IP address to your loopback, so your computer will try resolving those resources to your loopback address. Are you familiar with Chrome's admin tools (by pressing F12 when you are using Chrome)? It allows you to watch your page load live in different debug views and I think it would really tell you a lot, like what resources are hanging, etc. Do you think you could do some research and play with that and tell us what you find out? Your issue does seem like a serious one, but in all the years this project has been around we have never encountered this problem before with such a small hosts file ("39K+ is very tiny compared to others). So I am sure we are all interested to find a solution to this.

ScriptTiger commented 6 years ago

@user789465, did you ever get anywhere with this? Did you try debugging it with Chrome admin tools?

ghost commented 6 years ago

@ScriptTiger Yes I know for the loopback, np here. Like I said, I use Firefox much more, sometime I use Chromium, but Chrome even less. I'm sorry if I can't provide you with information right away, but I have a lot of work, and like I said, every time I have more info I'll post it here. testing seriously these issues takes time that I don't have at the moment.

ScriptTiger commented 6 years ago

No worries, we will this issue open for you when you have time.

bartszu commented 6 years ago

I have no idea but on Linux this is simple hosts file. You can deal with it anyway you want. Probably preferably with bash but I thought you guys using Py3 for it ?

funilrys commented 6 years ago

@bartszu I'm not into windows but with the few experiences I had with Ultimate.Hosts.Blacklist's users and other issues I read on GitHub, I would suggest:

Hope that this can help to find where the problem is located ...

Cheers and have a nice day/night.

d-grg commented 6 years ago

@user789465 Have you enabled IPV6? Try disabling it.

onmyouji commented 6 years ago

@user789465

Try disabling the "DNS Client" service, it solves my problem.

ScriptTiger commented 6 years ago

That is the third time someone has mentioned the DNS cache/client service, so there might be something to it. All the documentation I can find says that Windows looks both in the hosts file and in the cache in a predetermined order (I'm assuming cache first since we know old entries can still take precedence until it's flushed), but they both should still remain independent sources of each other and the DNS client service shouldn't be trying to consume everything in the hosts file from the get-go. However, I am starting to feel as though such documentation may be wrong, as it seems more like when the cache service is active Windows only uses it and only obtains hosts file entries indirectly through cached entries resolved by the DNS client resolver querying the hosts file, or at least this seems to be the case for some flavors of Windows but not others. Hosts files are actually the predecessor to modern DNS, but had to take a back seat since they weren't scalable to the number of domain names associated with the modern Internet. So DNS exists because everyone already knew large files had this problem, but 5 digits is a tiny sum for modern computers, even for modern mobile devices.

I must second @funilrys in that Windows can definitely be a strange beast at times.

stefanopini commented 6 years ago

Following also the issue #93, I worked a while on this and I think that the problem is how the Windows DNS cache/client service handles the hosts file. The DNS client service re-parses the hosts file every time the system connects to a network (even if it's a known wifi/eth network) and when the dns cache is cleared.

With certain conditions, this process seems to take a lot of time. I found that it doesn't happen with big hosts file, but only with long hosts files (i.e. with hosts file containing a high number of lines). Reducing the number of lines of the hosts file, the problem disappears, i.e. the parsing process is much faster (~15 times faster, at least on my machine running up-to-date Windows 8.1).

I wrote a code to compress the file that should solve the problem (see pull request #459). Let me know if it works.

Atavic commented 6 years ago

Issue here seems abnormal, even with a big hosts file, seeing the user system. Now the problem can be seen by using TCPView by sysinternals, as you can see which program is pinging which address. A big hosts file, that is easily managed by today's systems, can cause issues on non-optimized systems.

With non-optimized I mean a system that has various programs that are pinging addresses actually blocked by the hosts file. With many blocked requests the IP stack may choke, particularly when connections persist with a long time_wait state.

It may be an antivirus that connects everytime you reach a new URL, or that logged in page on twitter or google... again, TCPView may greatly help in these issues.

stefanopini commented 6 years ago

The issue does not seem related to a particular software, since it only happens when the PC establishes a connection to a network (wired or wireless). Actually, I am quite sure the problem is how Windows handles the DNS cache. In fact, when connecting to a network while using a hosts file with a high number of lines, the Microsoft's DNS client service heavily uses the CPU for a while. During this time, any request is delayed. Requests are processed only when the DNS client service stops using the CPU, i.e. when the DNS client service has finished parsing the hosts file (I guess). Using a hosts file with a lower number of lines (by inserting 9 domains in each line instead of 1, for instance), the parsing time of the DNS client service is notably reduced (as I reported, ~15 times faster during my tests).

ghost commented 6 years ago

I use StevenBlack host file + HPhosts with Hostman to merge my host file and it get big, using latest updated Windows 10, I disabled dnscache in regedit because in services.msc Microsoft don't allow me. The problem: every browser based on chromium hangs the computer because is using somehow the host file I created and I think Microsoft apps like Photos is using the file too. Any solution? On Windows 7 this don't happen. why

ScriptTiger commented 6 years ago

Check this out: https://github.com/ScriptTiger/Unified-Hosts-AutoUpdate/issues/5

"Compressing" your hosts file should speed up processes accessing it and possibly getting hung on it.

ghost commented 6 years ago

I noticed that mobile.facebook.com is not being blocked on social hosts but I don't want to create a issue for this so I commented.

ScriptTiger commented 6 years ago

@natuschaos, don't be shy to create an issue, that's what they're for :) Many active people here don't "watch" the repo due to current workloads, but do check in from time to time and skim the issue list. So if you comment in unrelated areas, it may not be seen by anyone that can help you as well as can be quite agitating for our esteemed repo master. Having said that, I tacked your request onto my PR and it should be merged soon enough, @Sinfonietta is usually pretty good about stuff like that. https://github.com/Sinfonietta/hostfiles/pull/27

jerriep commented 5 years ago

I am on Windows 10 and I also had this issue. I followed the advice related to compressing the file and configured HostsMan to rearrange the hosts file content to have 9 hostnames per line.

This sorted the problem out for me.

ScriptTiger commented 5 years ago

A lot of people have issues with HostsMan because its internal http(s) fetcher is poorly configured and does not follow complex redirects or other non-standard links, such as files hosted from the "raw.githubusercontent.com" domain.

terremoth commented 2 years ago

I am also having this problem. When I start my Windows or log in after log out, happens this:

So, I dumped that svchost.exe process that was using the DNS Client to try to figure out something, so I saw indeed the URLs are being processed (seems all the URLs are):

image

image


On Linux these things do not happen.

MAYBE the best solution of all is to have a raspberry pi and install PI-hole on it...

rautamiekka commented 2 years ago

Did you use the compression method ?

terremoth commented 2 years ago

Did you use the compression method ?

No. but I don't believe this will work in a long term solution since every week people add more and more URL's. The main problems is how Windows deal with this, putting everything in the memory to process, you can see in the image I showed, every URL is inside the file one next to each other. It uses more than 100mb/ram, compressing would not make much miracle.

terremoth commented 2 years ago

The "compress" options seems to only remove comments, blank lines e repeated URL's, and the Windows DNS Client looks like it does this by itself too. It does not compress with any compressing algorithm

StevenBlack commented 2 years ago

Lucas @terremoth our hosts files are all between 3.6 Mb and 5 Mb presently, depending on which flavour you're using...

ScriptTiger commented 2 years ago

Welcome, @terremoth!

I don't believe this will work in a long term solution since every week people add more and more URL's.

This list focuses on utmost efficiency and aggregates lists maintained by multiple niche curators who are constantly both adding and removing items from their respective lists every week, and the sum resultant list of all of them is what we get here downstream. This is not just an ever-expanding list in which we simply add entries and forget about them.

The main problems is how Windows deal with this, putting everything in the memory to process, you can see in the image I showed, every URL is inside the file one next to each other. It uses more than 100mb/ram, compressing would not make much miracle.

You are 100% correct in your point here that all of Windows' problems lie in how it deals with things. However, the problem is not as straightforward as simply slurping all of the contents into memory at once. Hosts files are intentionally slurped into memory at once as a design choice since even the largest of hosts files should never exceed more than just a handful of megabytes and slurping all of the contents into memory at once allows for much faster iterations of the data. Again, this assumes the file is smaller. Trying to slurp larger hosts files, such as the ones you may see around the Internet exceeding several to even hundreds of gigabytes, becomes counterproductive because then the data just has to be dumped to a cache/paging system, and proceeds to waste exorbitant resources all around, CPU, memory, as well as drive read/writes, just constantly reading and writing the data back and forth between the cache/paging system and memory.

The real problem comes into play with the combination of basic Win32 API functions for buffering the hosts file, cyclically scanning its lines of text, tokenizing them, and parsing the tokens. The Windows hosts file was never intended to store tens of thousands of entries, and it works under the assumption of a much smaller number of entries. This means it takes advantage of memory structures to parse the tokens, in theory, much quicker and intended for just a small number of lines. However, when dealing with tens of thousands of lines, these memory structures exceed the intended design and become exorbitant and counterproductive.

Simply drastically reducing line count alone helps to bring some aspects of the hosts file closer to the intended design specs on a Windows machine and is a fast and simple solution for achieving performance gains in orders of magnitude (10x, 100x, etc.), without exaggeration. This can be likened to the way precomputed rainbow tables drastically speed up password cracking by doing the computation ahead of time and then simply iterating over the hashes, rather than computing them on the fly and requiring much more CPU. "Condensing" or "compacting" the lines in a hosts file to take advantage of the 10-token maximum (the first token being the IP address, followed by 9 tokens representing associated domains, thus forming a single memory structure) reduces the creation of memory structures and associated overhead for each, which can quickly add up over tens of thousands of lines.

The "compress" options seems to only remove comments, blank lines e repeated URL's, and the Windows DNS Client looks like it does this by itself too. It does not compress with any compressing algorithm.

Here again is a completely valid point. As I mentioned above, I believe "compacting", "condensing", "consolidating" may be more apt terms in this case, as opposed to "compress," which implies more latent source data which would first require a decompression process before the data may become useful. Hosts files do not support "compression", but they do give you a bit of flexibility when it comes to how "compact" you wish to make a single line of 9 domain token strings associated with a leading IP token string. In the case of Steven's Python script, it gives you the option to "compact", or rather more efficiently "pack", 9 domain token strings to a line, rather than just associating one domain token string to one IP token string per line and massively exploding the resources required to then create and store those data structures and parse the tokens later.

I hope this gives a better explanation of both the problem and the "solution," as it were. Many Windows users have found just downloading the pre-"compressed" files to be an easier process for them, as Steven already puts a lot of time and effort into this repository and does not support Windows-specific "problems" or ease-of-use issues, etc.

https://scripttiger.github.io/alts/

rautamiekka commented 2 years ago

Compress, compact, same effect since only 1 technically exists ...