AdguardTeam / FiltersRegistry

Known filters subscriptions transformed for better compatibility with AdGuard
GNU Lesser General Public License v3.0
253 stars 55 forks source link

Privacy: ClearURLs #962

Open TPS opened 5 months ago

TPS commented 5 months ago

Prerequisites

Problem description

The ClearURLs database might be be transformed into a powerful privacy-enhancing filterlist &/or userscript.

Proposed solution

The specs @ https://docs.clearurls.xyz/latest/specs/rules/ would be utterly necessary to transform this to something end-usable.

Additional information

Originally found via https://github.com/svenjacobs/leon/discussions/315#discussioncomment-9809441, where several interrelated projects are thinking of how to incorporate this database themselves.

krystian3w commented 5 months ago

Alpha/Beta:

https://github.com/DandelionSprout/adfilt/blob/master/ClearURLs%20for%20uBo/clear_urls_uboified.txt https://raw.githubusercontent.com/DandelionSprout/adfilt/master/ClearURLs%20for%20uBo/clear_urls_uboified.txt

TPS commented 5 months ago

Definitely also see https://github.com/DandelionSprout/adfilt/discussions/163

krystian3w commented 5 months ago

Added years ago: https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/ThirdParty/filter_251_LegitimateURLShortener - https://github.com/AdguardTeam/FiltersRegistry/commit/65694c61d5fc8ea98782285edc14291c80d8c73a (https://github.com/AdguardTeam/FiltersRegistry/issues/401)

TPS commented 5 months ago

I do use LUS, but am hoping to improve coverage for these trackers.

Not identical, but now I think LUS is a derivative of ClearURLs (& probably other sources), so maybe this is duplicate in some sense? If you'd comment on the relationship between the 2, @DandelionSprout, it'd help.

iam-py-test commented 5 months ago

Conflict of interest disclaimer: I am the assistant maintainer of the Actually Legitimate URL Shortener Tool, and current maintainer of the ClearURLs for uBo list (I did not create the original ClearURLs for uBo list; credit for that goes to rustysnake)

DandelionSprout's LUS is a derivative of ClearURLs (& probably other sources)

It is not. While a few filters have been copied from elsewhere (with credit), most have been manually added either based on user reports or tracking parameters Imre (and I) found. Thank you

TPS commented 5 months ago

@iam-py-test Thanks very much for answering. 🙇🏾‍♂️ Could you comment on how different the contents of the 2 lists are from each other?

iam-py-test commented 5 months ago

The Actually Legitimate URL Shortener, as described, is a variety of rules manually added by Imre (DandelionSprout) and me. ClearURLs for uBo uses a Python script to convert the ClearURLs rules into a filterlist for uBlock Origin and AdGuard (basically what you requested here). There are a few modifications to remove problematic rules, but largely it's just the ClearURLs rules. Thanks

DandelionSprout commented 5 months ago

In theory, I could potentially have attempted to merge relevant entries from ClearURLs into LUS, which I can only presume would be a win-win for most parties.

TPS commented 5 months ago

@DandelionSprout 🙇🏾‍♂️ Actually, if the contents are that different, it'd make sense to keep them separate, & offer each as AG options to supplement each other & AG's other Privacy filterlists. OTOH, if the included rules overlap significantly, then it would make sense to use 1 as another source for the other, to keep down duplication.

DandelionSprout commented 5 months ago

So, I ran a comparison this morning about whether ClearURLs had any coverage that LUS didn't. I decided to test with Amazon, a high-coverage site in both lists.

LUS had well above 80 entries for Amazon (70 of them being specific entries). Only 2 entries that made sense (e.g. not ones like keywords or _encoding) had been in ClearURLs but not in LUS.

Although I do have conflicts of interest in the matter, I'd say that at this point ClearURLs has been obliterated in comparison. I give iam-py-test full 100% rights to make the calls on the following, with no interference from me, but I personally am getting unsure if a ClearURLs list conversion would be considered necessary nowadays. 😓

TPS commented 4 months ago

That's reasonable methodology. Possible to be more comprehensive over domain variety, like this is for TLD variety? I've a hunch that far-less-well-known sites than Amazon may have wider coverage on ClearURLs.

iam-py-test commented 4 months ago

Possible to be more comprehensive over domain variety, like https://github.com/StevenBlack/hosts/issues/1181#issuecomment-608229213?

Given both lists have many global (applies to all websites) rules, measuring such coverage would be difficult.

krystian3w commented 4 months ago

It is definitely worth testing which permissions deactivate the global removeparam (AdGuard only):

removeparam rules can also be disabled by $document and $urlblock exception rules. But basic exception rules without modifiers do not do that. For example, @@||example.com^ will not disable $removeparam=p for requests to example.com, but @@||example.com^$urlblock will.

Then the script "user.js" with API to edit parameters will probably work better on locked ranges.

https://adguard.com/kb/general/ad-filtering/create-own-filters/#urlblock-modifier