AdguardTeam / AdguardBrowserExtension

AdGuard browser extension
https://adguard.com/
GNU General Public License v3.0
3k stars 321 forks source link

[Bug] Enabling "Use optimized filters" corrupts filter count #230

Closed aitte2 closed 8 years ago

aitte2 commented 8 years ago

EDIT: After discussion below, it turns out that Adguard provides custom mobile versions of all 3rd party lists too, so anyone reading this can ignore my worries about why 75% of EasyPrivacy vanishes when "Optimized filters" is enabled. It's actually grabbing Adguard's own version of those filters, and it's not throwing away any useful rules! The only legitimate bug in this thread is the fact that optimized filters are always grabbed when a subscription is toggled from Disabled to Enabled, even if optimization has been disabled.

EDIT (ameshkov): The bug cause and possible solution: https://github.com/AdguardTeam/AdguardBrowserExtension/issues/230#issuecomment-210501514

How to reproduce:

The EasyPrivacy list at https://easylist-downloads.adblockplus.org/easyprivacy.txt contains about ~11k rules if my line-count is correct.

Maybe necessary info:

Your preparation to test the bug:

Testing the bug:

It seems to me that when you enable the subscription to a list, it loads the optimized rules for that list even if the "Use optimized filters" checkbox is disabled.


Furthermore, I am not sure I trust the Optimized filter feature. For your lists (the built-in Adguard lists like "English filter"), I can see that it loads the lists from mobile.adtidy.org instead of chrome.adtidy.org, and those lists are manually written by you so that's safe. But how is it able to optimize 3rd party lists? Things like EasyPrivacy do not provide any mobile lists (as far as I can see), and you seemingly don't host alternative versions of their lists either (your API only contains 13 lists, which are your own built-in ones). So how is it able to shrink EasyPrivacy to 1/4 of the original size?

I looked at some of the source code of Adguard Extension and it seems like the "optimize" feature also deletes all generic "wide CSS rules" rules that are not domain-specific? So a rule like "##div#advertisement" would be ignored when "Use optimized filters" is enabled? Judging by the massive shrinking of the EasyPrivacy list, it also seems like it ignores all rules that block specific URLs that aren't tied to any origin domain, so a rule like "||adf.ly/ad/conv?" would be ignored too?

Can you please explain here (or the wiki) how Optimize is able to shrink 3rd party lists, and what rules we lose by doing so? It's very worrying to see a 3rd party list like EasyPrivacy lose 75% of its rules, with no insight into what the "optimization" removed.

aitte2 commented 8 years ago

Here's the source code where it seems the rule-conversion-to-content-blocker script skips all internet-wide rules:

"* @param optimize if true - "wide" rules will be ignored" https://github.com/AdguardTeam/AdguardBrowserExtension/blob/46ee9b4ae30d89b820e62cbffb7de12e1a2b1df3/Extension/browser/safari/lib/converter.js#L799

And the definition of "wide" rules is here: "// Elemhide rules (##)" (without domain restrictions). I think it may also cover rules like "||adf.ly/ad/conv?" etc, since EasyPrivacy shrunk so much. https://github.com/AdguardTeam/AdguardBrowserExtension/blob/46ee9b4ae30d89b820e62cbffb7de12e1a2b1df3/Extension/browser/safari/lib/converter.js#L691

So the bugs here seem to be:

  1. The safari converter always optimizes away "wide" rules when a list is first enabled, even if "Use optimized filters" is disabled.
  2. Enabling and disabling the "Use optimized filters" checkbox will properly re-enable the full list without skipping "wide" rules.
  3. The hidden/internal "throw away wide rules" feature seems dangerous. Things like EasyPrivacy can't work without it. If you don't block things like 3rd-party scripts at known "evil" URLs from any source domain (aka a "wide rule"), then you don't get any privacy blocking, since those rules will be skipped which means a lot of 3rd party trackers will still be loaded.

Maybe split rule optimization and mobile lists things into two clearer, safer options?

Splitting the option into two would allow users to choose optimized mobile rules for the main ad-blocking (which takes the "English filter" rules from 38k rules to 13k), but without losing internet-wide rules in 3rd party lists. I don't think we have to care about "slowness" of "wide" rules anymore, thanks to Safari's content blocker. They're compiled to binary instructions and evaluated super fast. And nobody complains when something like Adguard's own native code injects tons of generic filters, which is probably a 20x slower method of checking filters than Safari's binary method. So it seems to me like we can safely include "wide" rules as long as the total is below Safari's 50k rule limit? But to allow that, the extension will need the changes above, to let us enable wide rules in 3rd party lists, but still be able to get mobile versions of the main filters (otherwise the 50k limit is easily reached, since the non-mobile "English filter" contains 38k rules for example).

ameshkov commented 8 years ago

Hi, you're looking at the wrong place, this "optimize" parameter is used by iOS app only.

aitte2 commented 8 years ago

@ameshkov So the Safari for Desktop convert does not delete wide rules?

But then why is EasyPrivacy losing 75% of its rules when optimize is enabled?

ameshkov commented 8 years ago

One moment, I am trying to describe it in details:)

aitte2 commented 8 years ago

Thanks a lot, take as much time as you need. Answer another day if you need to relax today. ;)

ameshkov commented 8 years ago

If you use Chromium or FF extension there is an option to send us ad filters usage stats (turned off by default):

2016-04-15_1741

That's just not possible to collect such stats in Safari, that's why it is in Chromium and FF only. When this option is turned on AG extension periodically sends the following data to the server:

  1. Extension version.
  2. Browser type
  3. List of enabled ad filters.
  4. The list that consists of the following elements: The domain name of the website, The number of recent page views, The list of filtering rules and Filter ID which were activated on this website, The domain name of blocked requests is sent for URL rules The list is created based on the website's visit statistics since the last time statistical data was sent.

More information about it is here: https://adguard.com/en/filter-rules-statistics.html

According to these stats we are able to understand which rules are redundant or just rarely used. The whole optimization thing is providing filters without redundant rules.

ameshkov commented 8 years ago

Here is a source code for that usage stats service: https://github.com/AdguardTeam/AdguardBrowserExtension/blob/master/Extension/lib/filter/filters-hit.js

aitte2 commented 8 years ago

@ameshkov Yes, that's how you made your mobile lists (from mobile.adtidy.org). It seems you've misunderstood me. I didn't ask about that (but it's good info for others, so thanks).

This is about a bug in the Desktop Safari version of Adguard's extension. Try the steps in the 1st post here (you can just read the clean steps described before the horizontal divider line).

Problems:

  1. Bug: Adguard enables "Optimized" lists when a subscription is enabled, even if "Use optimized filters" is disabled. See the steps to reproduce.
  2. Adguard throws away 75% of the EasyPrivacy list when "Use optimized filters" is enabled, which is weird since there are no mobile versions of the 3rd party filter lists (at least not on their official sites). So what is it throwing away? How does it know how to delete 75% of EasyPrivacy? Do you actually host your own, optimized mobile versions (thanks to your filter tracking service) of those lists somewhere that I haven't been able to find? Because the official EasyList/EasyPrivacy websites have no mobile versions at all - and your API doesn't seem to have any mobile versions of them either either. I looked at all 13 lists at mobile.adtidy.org:

http://mobile.adtidy.org/api/1.0/getfilter.html?filterid=1&key=4DDBE80A3DA94D819A00523252FB6380 to... http://mobile.adtidy.org/api/1.0/getfilter.html?filterid=13&key=4DDBE80A3DA94D819A00523252FB6380 It just cover's Adguard's own 1st-party lists. After 13, there are no more lists. So if you're hosting an optimized, mobile-ready version of all 3rd party lists that are available in the GUI too, then that'd be a surprise to me. Where are they hosted in that case? :)

This would be good info for me and others, since it looked to me like the 3rd party lists are being destroyed somehow. Losing 75% of a list and being unable to find any host/URL for a mobile version is worrying to say the least.

ameshkov commented 8 years ago

This is about a bug in the Desktop Safari version of Adguard's extension

There is a bug indeed, we'll return to it soon:)

Where is it? :)

It is there as well: http://mobile.adtidy.org/api/1.0/getfilter.html?filterid=101&key=4DDBE80A3DA94D819A00523252FB6380 http://mobile.adtidy.org/api/1.0/getfilter.html?filterid=102&key=4DDBE80A3DA94D819A00523252FB6380

We proxify all third party filters available in the extension on our servers.

ameshkov commented 8 years ago

Here, take a look at filters meta data and their IDs: https://github.com/AdguardTeam/AdguardBrowserExtension/blob/master/Extension/filters/filters.xml

ameshkov commented 8 years ago

Do you actually host optimized mobile versions of those lists somewhere?

Not just host. This /getfilter.html does the actual optimization (and caches the optimized version). Let me tell you how it looks like on the sever side.

Roughly, there is a table like this:

filter rule text rule popularity rank
popular_rule 123123
not_popular_rule 0

/getfilter.html checks every rule from the requested filter against that table and discards redundant rules.

aitte2 commented 8 years ago

@ameshkov Thanks a lot for that information! That's very impressive of you, to provide custom, lighter versions of all 3rd party filters, and to have filter ranks. While waiting, I was actually researching even deeper and even found the file you just mentioned (filters.xml) and saw the ID (118) for EasyPrivacy and was able to get it via the mobile API. Doh. I tried to find that ID earlier but assumed that no filters existed after number 13, since the API gave an error at 14. This discussion is probably going to be cached by Google and others can find this discussion if someone else ever wonders about the mobile feature and the fact that you optimize 3rd party rules too. Wow!

The main reason I became worried was because of the bug you just saw, and then I started researching and found the "optimize away wide rules" (for mobile) feature, and worried it was deleting useful rules. Glad to hear that's not the case!

I am happy that "Use optimized filters" is working properly even for 3rd party filters. That's just one of many reasons why Adguard is the number one Safari Content Blocker, and will probably remain that way forever. Who else has such clean source code and powerful features, and custom-optimized blocklists? I love the clean interface, the powerful ability to write user rules, and the ability to click to block elements (the smaller/larger slider in there blew my mind, and even being able to preview the blocking). You guys blow my mind with your talent and your willingness to describe the technical details. After 20 years online I've never seen anything like you guys. Top notch!

Thanks for confirming that there's a bug in the Optimize filters checkbox when enabling filters at least. Then I wasn't going totally mad. ;-)

Spasibo.

ameshkov commented 8 years ago

Пожалуйста:)

Thanks for these questions too, we should definitely clarify how this optimization is done.

I'd also like to make these filters stats available for ad filters authors. I've already once shared this information with fanboy, but it was a long time ago and these days it was not as useful as it is now.

ameshkov commented 8 years ago

Now about the cause of this bug.

Disable "Use optimized filters". The rule count says 34. Correct. Enable the checkbox next to "EasyPrivacy". The rule count says 2793. This is wrong! The list is supposed to have ~11k rules in its complete, unoptimized format.

When you switch "Use optimized filters" AG resets filters versions to 0.0.0.0 to force them to be updated from the server and then it starts filters update check. The problem is that it is done for enabled filters only.

In your case EasyPrivacy was disabled at that moment, so its version has not changed.

@Mizzick what do you think we should do? I suppose we should reset the version for all filters visible in "Ad Blocker" section.

One more solution proposed by @aitte2: add an "optimized" flag to the filter meta data instead of resetting filter version. This is a bit more complicated, but on the other side such solution is more flexible. In this case we should consider it while performing update to the new version.

aitte2 commented 8 years ago

Ahh, so the bug is that the version isn't set for disabled subscriptions, so no version change happens and therefore no re-download of the proper mobile/desktop list next time you enable the list. I saw your "version change" code earlier when researching, and maybe it should be changed to using a "isOptimized" flag in the "local version info" array, and the next time you enable a disabled subscription, it checks if "Use optimized filters" is enabled, and compares to the local cache's "isOptimized" value, and updates from the server if the type of desired list has been switched. That gets rid of the "version"-trick reliance. To make it even more future proof (to support more than just "optimized yes/no"), maybe I'd make it an integer flag, like 0 = normal, 1 = optimized, 2 = some future subset type etc.


To comment on the other part of the discussion:

It's good to clarify these things. Technical users like myself definitely have a tendency to distrust "black boxes" where magical things happen without enough details. Especially when the rule count seemed to misbehave. Hehe. :-)

If you want to make the stats available for ad filter authors, that's definitely a noble goal since it's a competitive advantage for you. Those stats add a huge amount of value to Adguard on both desktop and mobile.

What happens with a brand new rule added in a new update to one of the rule lists? Does it get a temporary "include always" exception until its popularity can be measured? Or is it never included in mobile until enough desktop users have verified that it's popular?

ameshkov commented 8 years ago

a "isOptimized" flag

Hm, this could allow us to make this optimization switch per-filter in future version. Updated my previous comment.

What happens with a brand new rule added in a new update to one of the rule lists? Does it get a temporary "include always" exception until its popularity can be measured? Or is it never included in mobile until enough desktop users have verified that it's popular?

Yep, it is "include always" for 30 days.

aitte2 commented 8 years ago

30 days. That's great! It means Safari users with optimized lists won't suffer new ads all of a sudden. :-) All the "black box" mystery is vanishing and I see how well-engineered your entire system is. I am very impressed. Thanks for the open discussion. It's hugely appreciated that you took the time to clarify everything for me and others.

These are the two final questions related to subscriptions. I don’t know any other place to ask you this, and it’s something I tried finding on Google for a long time today (only found a few irrelevant forum threads), so this would also be good info in a FAQ somewhere:

ameshkov commented 8 years ago

Do we get duplicate rules or does Adguard merge all duplicates in all enabled lists?

All duplicates are eliminated: https://github.com/AdguardTeam/AdguardBrowserExtension/blob/master/Extension/lib/filter/antibanner.js#L983

Is "English filter" kept up-to-date with upstream changes to EasyList, or is it separately maintained after it was forked? Basically, I wonder if it's worth enabling the latest official EasyList too, when I already have your "English filter".

Yes, it is synchronized periodically so you can leave it alone. I even think that it will make sense to rename this filter to "EasyList + Adguard" filter or something like that just to make it clear.

ameshkov commented 8 years ago

Ehm, i meant "leave just one", english:)

aitte2 commented 8 years ago

I see, yet again it does the perfect thing (de-duplication).

About the lists: Since those lists are synced periodically, I totally agree with renaming "English filter" to "EasyList + Adguard filter", because it's not just for English sites (most of EasyList takes care of global ad providers that are used in every language), and it even confuses advanced users like me to see both lists (and I saw so many forum threads with people enabling both due to confusion). Naming it "EasyList + Adguard filter" also makes it more obvious that it's the most important filter of them all and that it should always be enabled by all users no matter where in the world they live.

Thanks a lot for all the information, Andrey. That's all. I will thank you by spreading the word about how awesome Adguard is. :-) I was on the "uBlock kool-aid" for a while but finally found a superior replacement that's both fast, powerful and incredibly easy to use. Loving every minute of Adguard and will recommend it to everyone!

Mizzick commented 8 years ago

Fixed