DandelionSprout / adfilt

The place where I, DandelionSprout, store my web filter lists for countless topics, including my Nordic adblock list. As simple as that, really.
Other
1.3k stars 143 forks source link

Fix domain exception filters for `$removeparam` #636

Closed andylizi closed 1 year ago

andylizi commented 1 year ago

Currently, the domain exceptions in ClearURLs rules don't seem to work correctly. For example, we have filters like this:

$removeparam=callback,domain=bilibili.com
@@$removeparam,domain=api.bilibili.com

This should disable $removeparam for api.bilibili.com. But for some reason, it doesn't work[^env]. callbcak parameters are still removed, thereby breaking the site.

The following does work as intended, however:

@@||api.bilibili.com^$removeparam

I don't have much experience with uBO filters, so I don't really understand the difference between those two. Maybe this behavior could be considered an uBO bug? Anyway, please let me know if there's a better way to do this.

[^env]: Chrome 104.0.5112.81, uBlock Origin 1.43.0, ClearURLs for uBo 30/06/2022

DandelionSprout commented 1 year ago

On paper,

Presuming that the tests took place on www.bilibili.com, the latter approach would indeed be the only one that'd work.

@iam-py-test was the guy who designed the script that the pull request is modifying, so I'll let him make the final decision on whether to merge or not. That being said, I do agree with you about changing the format to $removeparam=parameter,domain=example.org and will vote in favour of merging.

andylizi commented 1 year ago

On paper, @@$removeparam,domain=api.bilibili.com is supposed to whitelist all parameter removals when visiting a api.bilibili.com page, incl. third-party requests.

Wait, weird negation interaction aside, the behavior you described (without @@ negation) doesn't sound right to me. For example, I tried running await fetch("https://example.com/?callback=test") on www.bilibili.com, and the callback parameter is removed. I don't think this is supposed to happen……?

iam-py-test commented 1 year ago

That change seems like it makes sense (i.e. @@$removeparam,domain=myaccount.google.* -> @@||myaccount.google.*^$removeparam?) so I am going to merge this pull request Thank you @andylizi

andylizi commented 1 year ago

@iam-py-test Should we also change the non-negated version, e.g. $removeparam=source,domain=google.* to ||google.*^$removeparam=source?

Because as I previously mentioned, if we have a rule like "remove source parameters from requests to google.com", we shouldn't just remove every source parameter from every outgoing request (e.g. to unrelated.com) just because we're on google.com, which is currently happening.

         elif "/" in url_pattern:
             write_rules(
                 url_pattern,
                 rules,
                 "||{1}$removeparam=/^{0}=/",
                 "||{1}$removeparam={0}",
                 filterlist,
             )
         else:
             write_rules(
                 url_pattern,
                 rules,
-                "$removeparam=/^{0}=/,domain={1}",
-                "$removeparam={0},domain={1}",
+                "||{1}^$removeparam=/^{0}=/",
+                "||{1}^$removeparam={0}",
                 filterlist,
             )
Nojuuu commented 1 year ago

If that's the behavior, it's probably a good idea to update some of the filters to be more specific in Actually Legitimate Shortener as well, at least some of the 2-3 letter parameter ones, as they tend to usually be the ones that cause problems.

iam-py-test commented 1 year ago

If that's the behavior, it's probably a good idea to update some of the filters to be more specific in Actually Legitimate Shortener as well, at least some of the 2-3 letter parameter ones, as they tend to usually be the ones that cause problems.

Not sure what @DandelionSprout thinks about that. Seems like a good idea to me, but I will wait until he ways in.

iam-py-test commented 1 year ago

@iam-py-test Should we also change the non-negated version, e.g. $removeparam=source,domain=google.* to ||google.*^$removeparam=source?

Because as I previously mentioned, if we have a rule like "remove source parameters from requests to google.com", we shouldn't just remove every source parameter from every outgoing request (e.g. to unrelated.com) just because we're on google.com, which is currently happening.

Changed it in https://github.com/DandelionSprout/adfilt/commit/4380b12436fe4cc434e460bdbe57956e0fbe3fcc, applied in https://github.com/DandelionSprout/adfilt/pull/639. Also, I converted the exclusions, so hopefully they will still work.

Nojuuu commented 1 year ago

~Hmm, it doesn't actually look like it removes parameters from all requests, but does it for some reason when trying to fetch in console?~

edit: Nvm, I reproduced it with filters.

iam-py-test commented 1 year ago

edit: Nvm, I reproduced it with filters.

Sorry, I am confused. Is there a problem with the modification?

Nojuuu commented 1 year ago

No, everything is fine 😄

iam-py-test commented 1 year ago

No, everything is fine 😄

Ok, thanks.