celzero / rethink-app

DNS over HTTPS / DNS over Tor / DNSCrypt client, WireGuard proxifier, firewall, and connection tracker for Android.
https://rethinkfirewall.com/
Apache License 2.0
2.89k stars 147 forks source link

Auto update blocklists #564

Open ignoramous opened 2 years ago

ignoramous commented 2 years ago

Some users have asked that they rather not have to check for updates every few days and manually download them.

We initially disabled auto updates because auto updating things involves "phoning home"... sending requests to rethinkdns' servers, which users may not like.

Now that we've got a nice work-manager driven download, we should probably offer a way to auto-download updates as they become available. Of course, this setting should be behind a setting disabled by default.

afontenot commented 2 years ago

IMO it would be useful if the app could check for and download updates only for the blocklists a user actually has enabled. The current method which seems to batch them together is inefficient, requiring large (~60 MB) downloads which could be expensive for users on metered Internet connections. Batching also (correct me if I'm wrong) relies on the maintainers of Rethink to continue updating the lists, meaning that if this project were ever abandoned for any reason, the ad blocking capability of the app would stop working over time.

I don't think bandwidth for the blocklist providers is a major concern. Even ad blockers with huge numbers of users like uBo download updates directly from the source.

ignoramous commented 2 years ago

Downloading individual lists (if you were to use all 190+) and searching through them efficiently (think fast) is at least 200M+ of memory, if not more. This is the reason why the lists are compressed into a single blob as they are.

Besides, the dns-blocking code that runs on the app is a golang port of whatever's running on the rdns servers (written in javascript).

Batching also (correct me if I'm wrong) relies on the maintainers of Rethink to continue updating the lists, meaning that if this project were ever abandoned for any reason, the ad blocking capability of the app would stop working over time.

If the app is abandoned, then users have then users have a bigger problem than just losing out on adblocking ;) That said, even the code to generate that single blob of blocklist is open source: https://github.com/serverless-dns/blocklists

The current method which seems to batch them together is inefficient

Well, it is a trade-off, but yes, if a user has use for just one blocklist, then the whole 60M download is as inefficient as it gets. In that case, I'd rather they use server-side blocking that's available.

That said, we do intend to impl ability to use arbitrary blocklists in the app. #293

afontenot commented 2 years ago

Downloading individual lists (if you were to use all 190+) and searching through them efficiently (think fast) is at least 200M+ of memory, if not more. This is the reason why the lists are compressed into a single blob as they are.

I was thinking, perhaps naively, that it wouldn't be too bad to just download the user-requested blocklists and then regenerate the radix tree on the device. This would probably (?) not be too expensive, I would guess most users are not using that many lists ("aggressive privacy" uses only 11 blocklists out of ~189 available).

More than just download size, the other problem this would solve is that some sites present fast moving targets for ad blockers. Something I've encountered with Rethink is that a site or app will start showing ads, but no update will be available. On the desktop, this problem is almost always fixed by forcing an update in uBlock origin.

As a case in point, if the current blocklist timestamp of 1662384683026 is accurate, Rethink hasn't pushed a new version of the blocklists since September 5, nearly 2 weeks ago! That's a remarkable amount of latency. (Of course, you could start pushing updates every day, but then you end up incurring the cost of that ~68 MB download each time by a lot.)

Side note: compressing trie.bin with zstd --ultra -22 reduces its size from 64M to 25M. The server does not seem to be automatically compressing this file, but manual compression would reduce the size of the download substantially.

ignoramous commented 2 years ago

I was thinking, perhaps naively, that it wouldn't be too bad to just download the user-requested blocklists and then regenerate the radix tree on the device. This would probably (?) not be too expensive, I would guess most users are not using that many lists ("aggressive privacy" uses only 11 blocklists out of ~189 available).

We need 32GB servers to generate the compact trie. We could do this on-device with sufficiently large RAM (or, if we come up with a clever implementation), but unsure if the CPU can keep up (: The code isn't optimised, and it isn't a low-hanging fruit, so to speak, to optimise it. It might take a lot of time to reach a level where the trie can be generated on-device... but we'd rather spend time impl other (what we gauge to be) important features.

Something I've encountered with Rethink is that a site or app will start showing ads, but no update will be available.

We have plans to do diff updates (https://github.com/serverless-dns/blocklists/issues/19), but again, it isn't priority until after the v056 release (Android TV support).

Rethink hasn't pushed a new version of the blocklists since September 5, nearly 2 weeks ago! That's a remarkable amount of latency.

I know. We had automated this, but then the automation broke, and we never fixed it. Now, it is a manual process ran by another developer who has too many other commitments (as in, he doesn't work on Rethink full-time) and has neither got around to fixing automation, nor the discipline to keep publishing it every week, nor the time to implement diff updates :shrug: If it were up to me, we'd have had diff updates by now... but I don't pay him, so I can't ask him to do my biding, exactly...

Side note: compressing trie.bin with zstd --ultra -22 reduces its size from 64M to 25M. The server does not seem to be automatically compressing this file, but manual compression would reduce the size of the download substantially.

We don't compress it, because we'd have to uncompress it back again. We instead rely on brotli / gzip compression that the HTTP transport does for us. The compressed trie, once downloaded, is loaded in-memory as-is. Today, there's no post-processing that's required to use the trie (and there are advantages to keeping it that way) (:

ignoramous commented 2 years ago

Also: Rethink is never intended to be an ad-blocker. It is a firewall (that can also block ads). All the ad-blocking features we implement are bonuses (in our minds), and rather simply exist as part of the app just because its users have demanded it. If left to our tools, we'd be prioritizing so many other non-ad-blocking features we have in mind (some of which are documented on github issues)... but that's a fallacy, and we know better than to not give users what they want! :D

afontenot commented 2 years ago

Edit: if you can, it might be a good idea to move the last several comments to a new issue or a discussion, since we've drifted a bit off topic.

We don't compress it, because we'd have to uncompress it back again. We instead rely on brotli / gzip compression that the HTTP transport does for us.

That's what I meant - I assumed this would happen, but I tried downloading the blocklists three different ways:

In all cases the file seems to be transferred uncompressed. I even checked this with Wireshark to be sure, it's sending ~64 MB over the wire. Perhaps a server config issue?

Also, the app warns the user that the download may be about 60 MB. If it was actually compressed it should be under 30, so the warning seems to be based on the uncompressed size?

ignoramous commented 4 months ago

Also, the app warns the user that the download may be about 60 MB. If it was actually compressed it should be under 30, so the warning seems to be based on the uncompressed size?

The uncompressed blocklists size is > 100mb; so ~60mb gzipped seems directionally correct?