AdguardTeam / CoreLibs

Core Adguard libraries
https://adguard.com/
Apache License 2.0
39 stars 7 forks source link

$domain modifier works incorrectly alongside with $removeparam modifier #1490

Closed AdamWr closed 3 years ago

AdamWr commented 3 years ago

Probably related to this issue - https://github.com/AdguardTeam/CoreLibs/issues/1421

Steps to reproduce:

  1. Disable AdGuard URL Tracking filter and add this rule:
    $removeparam=ref,domain=arcelik.com.tr|uspoloassn.com|example.org
  2. Go here - https://www.google.com/search?q=%22%3Fref%3DAffOcean%22
  3. Click on first link

Opened link looks like that:

https://tr.uspoloassn.com/kadin-koleksiyon/?utm_term=%7Baffiliate_id%7D&utm_campaign=cps&utm_content=%7Baffiliate_name%7D&utm_source=AffOcean&utm_medium=affiliate&ref=AffOcean&page=44&layout=4

$removeparam=ref,domain=arcelik.com.tr|uspoloassn.com|example.org should removes ref=AffOcean, but it doesn't work.

Similar for exception rules, for example if AdGuard URL Tracking filter is enabled and I add @@$removeparam=utm_term,domain=uspoloassn.com to user filter and open website clicking on one of the Google search result (from link from seconds step) then parameter is still removed. I have to add @@$removeparam=utm_term,domain=google.com to whitelist it. I think that it shouldn't work like this.

AdGuard for Windows 7.7 nightly 10 (build 3626, CL 1.8.141)

ngorskikh commented 3 years ago

Hmm, the problem is, when opening the page from the search results, the referrer is google.com, which doesn't match any domains in the rule. How does it work for other types of rules?

AdamWr commented 3 years ago

For example, if I use rule like this: $image,third-party,domain=arcelik.com.tr|uspoloassn.com|example.org then it seems that it's applied.

Screenshot ![image](https://user-images.githubusercontent.com/29142494/128176630-c8ec6688-3c0d-4764-91b4-f81a7cf3dd93.png)
ngorskikh commented 3 years ago

I see, the images are loaded from the page itself, so the requests for them have uspoloassn.com as the referrer, but the page is loaded from the search results, so there the referrer is google.com (and doesn't match any rules). This is highly unintuitive, but I can't think of how we can handle this :(

ngorskikh commented 3 years ago

I think, if the intention is to remove parameter ref only from URLs containing arcelik.com.tr|uspoloassn.com|example.org then the $domain is actually not the right solution in the first place: the rule should be something like

/.*\://(.*\.)?(arcelik.com\.tr|uspoloassn\.com|example\.org)((/|\:|\?).*|$)/$removeparam=ref

(or three separate rules)

AdamWr commented 3 years ago

I have just added:

||uspoloassn.com^$removeparam=ref
||arcelik.com.tr^$removeparam=ref

to AdGuard URL Tracking filter and it works fine, but there is another filter list (Legitimate URL Shortener) which contains a lot of rules with domain and removeparam modifier:

``` $removeparam=height,domain=i-viaplay.com.akamaized.net|media.discordapp.net|tvno.nu $removeparam=/^utm_/,domain=~redir.tradedoubler.com $removeparam=/^itm_/,domain=~redir.tradedoubler.com $removeparam=/^stm_/,domain=~redir.tradedoubler.com $removeparam=tag,domain=amazon.*|argos.co.uk|tv2.no $removeparam=referral,domain=tv2.no|spotify.com $removeparam=t,domain=rakuten.com|steamcdn-a.akamaihd.net|duckduckgo.com $removeparam=aid,domain=adguard.com|booking.com $removeparam=action,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=module,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=pgtype,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=section,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=sk,domain=bing.com|aliexpress.* $removeparam=referrer,domain=~f4map.com|~deviantart.com $removeparam=surface,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=fellback,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=req_id,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=variant,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=sourceid,domain=~google.* $removeparam=rid,domain=giphy.com $removeparam=/^shared/,domain=seura.fi|suomenkuvalehti.fi|kotiliesi.fi|tekniikanmaailma.fi|anna.fi|rakennusmaailma.fi $removeparam=origin,domain=yle.fi|flaticon.com|drop.com|symbolab.com|archive.org $removeparam=channel,domain=rei.com|squarespace.com $removeparam=pid,domain=play.google.com|th.bing.com ||audible.$removeparam=ref,domain=audible.*|webhallen.no $removeparam=epi,domain=fernerjacobsen.no|webhallen.com $removeparam=cid,domain=bitdefender.* $removeparam=source,domain=boxboat.com $removeparam=src,domain=3cx.com $removeparam=index,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=pool,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=region,domain=nytimes.com|nytimes3xbfgragh.onion $removeparam=cid,domain=apple.com|www.microsoft.com|telenor.no|proshop.*|rightmove.co.uk|giphy.com|nbcnews.com|voxi.co.uk|lenovo.com|rei.com|adguard.com $removeparam=ref,domain=amazon.*|deal.no|expo.io|udemy.com|reddit.com|brave.com|facebook.com|arvopaperi.fi|twitch.tv|mikrobitti.fi|tivi.fi|tekniikkatalous.fi|talouselama.fi|kauppalehti.fi|uusisuomi.fi|mediuutiset.fi|sketch.com|etsy.com|geeksforgeeks.org|hs.fi|bitwarden.com|is.fi|github.io|spiegel.de|kansanuutiset.fi|bestbuy.com|shopping.google.*|target.com|bing.com|motouutiset.fi|telsu.fi|idenati.com|blogit.fi|suomi24.fi|scandinavianphoto.fi|github.com|dell.com $removeparam=src,domain=thunderbird.net|twitter.com|udemy.com|elastix.org|bluejeans.com|melopero.com|kubii.*|reichelt.*|okdo.com|raspipc.es|tiendatec.es|thepihut.com|cytron.io|buyzero.de|welectron.com|digikey.*|adafruit.com|hinta.fi $removeparam=trk,domain=splashthat.com|aws.amazon.com $removeparam=/^hc_/,domain=facebook.com|facebookcorewwwi.onion|messenger.com $removeparam=lst,domain=facebook.com|facebookcorewwwi.onion|messenger.com $removeparam=/^pn_/,domain=facebook.com|facebookcorewwwi.onion|messenger.com $removeparam=/^zero_/,domain=facebook.com|facebookcorewwwi.onion|messenger.com $removeparam=amdata,domain=ebay.* $removeparam=epid,domain=ebay.* $removeparam=hash,domain=ebay.* $removeparam=iid,domain=ebay.* $removeparam=source,domain=googleusercontent.com|contabo.com|wired.com|tiktok.com|thedailybeast.com|itnext.io|medium.com|awd-it.*|webhint.io $removeparam=w,domain=files.wordpress.com|store.playstation.com|s-microsoft.com|store-images.microsoft.com $removeparam=h,domain=files.wordpress.com|store.playstation.com|s-microsoft.com|store-images.microsoft.com $removeparam=width,domain=i-viaplay.com.akamaized.net|media.discordapp.net|tvno.nu ```

If I understand correctly, these rules will not work if it will be opened from Google search results.

ngorskikh commented 3 years ago

Depends on the value of the $domain modifier, for example this rule:

$removeparam=sourceid,domain=~google.*

looks like the author specifically wanted to NOT remove the params when opened from the search results (the modifier reads something like this: "limit the scope of the rule to only when the referrer is not google-dot-something")

AdamWr commented 3 years ago

I understand it in a different way :) For me, it means that this parameter should not be removed only on google.*, but it should be removed on any different website, so it should be also removed on these websites which are opened from search results, and it seems that it works like that in AdGuard extension. But it works differently in AdGuard apps.

ngorskikh commented 3 years ago

Turns out there was an issue with this special case not working: https://github.com/AdguardTeam/AdguardKnowledgeBase/blob/master/01.general/02.how-to-create-your-own-ad-filters/docs.en.md#domain-modifier-matching-target-domain Fixed in core/pull-requests/2491