LennyFox / Blocklists

My personal blocklist
3 stars 0 forks source link

Just comments #2

Closed Yuki2718 closed 4 years ago

Yuki2718 commented 4 years ago

Hi, I saw your comment on MalwareTips after @rdsu-pt quoted me, but it seems the most important part was not taken - and now I found your Github account. Please don't take my comments as criticism - we're both helping others and ourselves for better and efficient blocking.

The most important is prefixing with | to avoid the rule to be evaluated against all requests. ABP syntax assumes wildcard before and after each rule, and thus a rule like http:// must be checked against all the requests so that it cathes e.g. https://example.com/ads?param=http://something. Obviously this is not the purpose of your or Kees' rule and by the prefix it will no more be checked against requests start from https, ftp, or whatever interpreter-specific protocols. This is why * is not needed in the rule - but note there are some cases explicit * is needed other than the middle of body, the one will be explained below. Also I think you misunderstand ^, this is a separator to match some special characters or end of address so don't use it unnecessarily. Other than that, maybe typo, but | in domain= is separator so shouldn't be added at the end.

I glanced at your filter too - and frankly, have to say there are some problems. First, do not use this type of rules: /.*pagead.*/ they're taken as regex rule which is thousands times slower than normal rule. Correct format is /pagead/. Second, this rule: `google.###gb > div.gb_0f > div.gb_Xa.gb_Fg.gb_i.gb_Eg.gb_E > div.gb_Zc.gb_Fg.gb_i:first-child > div.gb_id.gb_jd.gb_7c.gb_td.gb_Dd.gb_na:last-child > div.gb_F.gb_na` Looks like randomly generated classes and copied from picker - basically do not use first-child, last-child, or nth-of-type etc. unless really needed as they're FP-prone - also do not use automatically generated rules by picker unless they're very simple AND not too many rules (automatic rules tend to be too specific, and they have significant limitation). When a rule is like this, better to use extended CSS rules.

Other findings:

LennyFox commented 4 years ago

The Less_Alphabet filter was from Kees1958, but he removed it. It think it was intended to use specific domains. To prevent False Positives I copied the limitation to the specified Google domains. So I an not going to change that. Another reason to limit the is the use of regular expressions to specified domain is the performance you mentioned. With the domain scope limitation these filters don't seem to impact browsing performance (and I am using a 2010 Intel Celeron G4600 dual core, so a low spec CPU by today's standards).

Why is google.*###hdtb problematic? please explain.

You mentioned prefix with |do you mean | or ||?

Implemented all other tips (as far as I understood them)

Yuki2718 commented 4 years ago

I know him, but it's now your filter. I meant limiting Youtube rules to YT domain doesn't make sense, but sure limiting googlesyndication to Google domains makes sense. Just avoid regex rule unless really needed, they're not tokenizable1 and needlessly heavy - and you don't need any of the regex rules for YT. Filter matching for each request is done within milli- or even microseconds even on a very old computer - gorhill (uBO dev) and I also use a PC of the similar spec - so don't rely on human perception when comparing performance (this is what gorhill has repeatedly claimed e.g.)

| and || have different semantics. for your rule on MT, use the former. Details are explained in the ABP link in the previous comment.

Sorry, I was not clear - I'm not good at explaining things even in my mother tongue. It's okay to add ^ to end of domain to match e.g. example.com/somthing, example.com:8080, example.com#, example.com?param=, or example.com. What I wanted to say was it's not an end-anchor but a (kind of) partial-wildcard, so ||gstatic.com/shopping^ doesn't make much sense - but still okay in this case as ? always follows (but then, why don't you make it ||gstatic.com/shopping? or ||gstatic.com/shopping)

That rule hides image search etc. google

[1] I don't know as much about AdGuard's algorithm as uBO, but the fact regex rule is slow holds to any ABP compatible blocker. On uBO the rule is essentially the same as *pagead* except for domain limitation (yes, good practice indeed) and need to be checked against all requests (on YT), while /pagead/* is tokenized and processed VERY efficiently - if rules are tokenized, the number of rule doesn't matter. More details about tokenization in case you wonder, tho from different vendors.

LennyFox commented 4 years ago

Sorry I am lost "| and || have different semantics. for your rule on MT, use the former. Details are explained in the ABP link in the previous comment."

So former=first, meaning | (the hard end pipe), but for which rule should this be applied?

LennyFox commented 4 years ago

OKAY removed image search bar, thanks for your explanation and time

Yuki2718 commented 4 years ago

Sorry I am lost "| and || have different semantics. for your rule on MT, use the former. Details are explained in the ABP link in the previous comment."

So former=first, meaning | (the hard end pipe), but for which rule should this be applied?

Sorry for poor explanation. In a nutshell, | is for the beginning (or end) of any rule and often used for |http:// or |https://. || is for domains so ||example.com^ blocks not only example.com but also all subdomains such as ads.example.com.

For this: https://malwaretips.com/threads/adblocking-innovation.98862/page-5#post-873319

You're welcome!

LennyFox commented 4 years ago

Yuki, the top500 list works amazingly well. When something is missed on websites I use often, I simply use the Adguard's excellent 'point and block' element picker to hide things. Looking at the advanced features of Adguard, they offer "contains" filtering, so they probably have put some effort in optimizing regex filtering also.

LennyFox commented 4 years ago

@Yuki2718 Thanks for all your valuable input and explanation!

Yuki2718 commented 4 years ago

Lenny, I'm not going to discuss much about the topic on MT - I basically don't comment on other's settings unless obviously terrible (and yours is not). I remember the days with Android 2.x when I was satisfied with a local-proxy that blocks ads with about 100 rules - even within them there was gradient - some such as google-analytics got thousands of blockage per week while 80% of rules a few blockage. Actually, uBO (& Ghostery, Brave) exploits this fact that most rules will never or rarely be used1. They should not even be considered unless the request actually matches. It extracts bunch of common keywords from requests and rules which covers most of requests, allowed or blocked, cache them and only look for them except for rare occasion that a request doesn't match any of them2. Ofc it's oversimplified but the essence is covered. But this also means rules for uBO must be written in tokenizable form if possible3, but IDK about AG - in fact I noticed obvious difference in battery life when Base filter was disabled on an Android phone. Of note, Fanboy has made substantial overhaul to EL in late 2019, but I believe there are still many rules rarely used. OTOH people tend to subscribe too many filters, I have seen too many such cases on AdGuardFilters - it's not only bad for performance, but also bringing problems (often get Not Reproduced as nobody bothers to test with such many filters), and one thing they missed is adding filters can rather cause less blocking - even high quality filters have some loose whitelists, one reason I shared anti-whitelist. Probably you won't agree on all of my points, that's okay, but I think we can at least agree to discourage such practice of subscribing too many filters, in each place.

[1] I have discussed this on Wilders like you did on MT.

[2] This is why on Brave adding EasyPrivacy with 16,000 rules added little cost even without saved resources by 1.7 times more blockage. The same goes to uBO, only measurable difference should be slight increase in memory usage.

[3] Actually it's opposite, these blockers have evolved to be optimal to common filters like EasyList.

Yuki2718 commented 3 years ago

It's long time since I left MT, hope you remember me. So today I found your post on MT https://malwaretips.com/threads/a-b-testing-allowed-in-easylist-filters.105783/ and felt to explain thing a bit, but as I closed MT account I don't have means - but then remembered you also have GH account.

Going straight to the point, media.trafficjunky.net/__js2/test.js is not for AB test. We call this kind of staff check or bait, because they're used to detect ad-blocker. If you visit the URL you'll see it only sets page_params.holiday_promo true, so the site can easily tell whether trafficjunky is blocked by seeing the property. What sites do by check depends, some display anti-adb, others re-inject circumvention ads, but most do nothing - they only get statistical info about how many user use blocker. I haven't investigated Pornhub, but I remember the same property was/is used on other porn sites such as yourporn for re-injection such that you'll see circumvention ads only if your interaction with sites exceeds a threshold. These can be and were addressed by scriptlet, but addressing them by allowlist is not bad - in fact better in terms of performance and wider platform support. BTW AG Base is basically EL + its own added filters, in fact @@||media.trafficjunky.net/__js2/test.js$script,domain=pornhub.com|pornhubthbh7ap3u.onion is in Base too.