brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.57k stars 2.28k forks source link

Debouncing action to include all further URL parameters as part of the target URL #22429

Open antonok-edm opened 2 years ago

antonok-edm commented 2 years ago

The following URL was observed from a Google search page product ad placement when searching for home depot tracking light:

https://clickserve.dartsearch.net/link/click?lid=92700067144412497&ds_s_kwgid=58700001236285396&ds_s_inventory_feed_id=97700000000001001&ds_a_cid=75683555&ds_a_caid=378324755&ds_a_agid=26777702795&ds_a_fiid=&ds_a_lid=pla-1457212127228&ds_a_extid=&&ds_e_adid=102712441595&ds_e_matchtype=search&ds_e_device=c&ds_e_network=g&ds_e_product_group_id=1457212127228&ds_e_product_id=202501682&ds_e_product_merchant_id=8740&ds_e_product_country=US&ds_e_product_language=en&ds_e_product_channel=online&ds_e_product_store_id=&ds_url_v=2&ds_dest_url=https://8808.xg4ken.com/trk/v1?prof=404&camp=19651&kct=google&kchid=7097773753&criteriaid=pla-1457212127228&campaignid=378324755&locphy=9061268&adgroupid=26777702795&adpos=&cid=102712441595&networkType=search&kdv=c&kext=&kadtype=pla&kmc=8740&kpid=202501682&url=https://www.homedepot.com/p/Lite-Line-8-ft-White-Track-Finished-HD-TR122/202501682?g_store=&source=shoppingads&locale=en-US&pla&mtc=Shopping-BF-F_Brand-G-Multi-NA-Multi-NA-Feed-PLA_LIA-NA-NA-Catchall_PLA&cm_mmc=Shopping-BF-F_Brand-G-Multi-NA-Multi-NA-Feed-PLA_LIA-NA-NA-Catchall_PLA-71700000014585962-58700001236285396-92700067144412497&gclsrc=aw.ds&gclid=EAIaIQobChMI3c322fag9wIVAcLCBB1ibwDvEAQYAyABEgJD3vD_BwE

This URL navigates through two levels of bounce tracking, first through clickserve.dartsearch.net, and then through its ds_dest_url parameter to 8808.xg4ken.com, and then finally through its url parameter to www.homedepot.com.

In order for Brave to properly debounce links like this, we need to use a debounce action that takes the entire rest of the URL after the relevant parameter, without any additional decoding, and redirects to it.

@pes10k has suggested calling this action query-param+, or implementing a general regex capability to be able to capture this along with additional cases.

antonok-edm commented 2 years ago

Another test case from @fmarier:

https://app.adjust.net.in/nxqv3i_f3j3sj?fallback=https://w2.outlook.com/l/mobile?WT.mc_id=MobileAppSignature

In this particular case, we need to redirect to what comes after fallback= and not try to extract that as a parameter because it's not encoded.

For example, the following URL: https://app.adjust.net.in/nxqv3i_f3j3sj?fallback=https://example.com?q=a&uid=1234 needs to be redirected to: https://example.com/?q=a&uid=1234 as opposed to this if we extract the value of fallback: https://example.com/?q=a

fmarier commented 2 years ago

Here's another one:

https://st1.zoom.us/web_client/5g6glw/html/externalLinkPage.html?ref=https://brave.com/privacy/browser/#ads

The Zoom redirector incorrectly swallows the anchor in this one making it impossible to link to a section of a page within Zoom.

fmarier commented 2 years ago

Rules that can be re-added once this has landed:

ShivanKaul commented 2 years ago

The AMP cache URLs that we would want to debounce: https://www-theverge-com.cdn.ampproject.org/c/s/www.theverge.com/platform/amp/2018/9/20/17881766/bing-google-amp-support-mobile-news

They are all of the form: https://<cache-host>/c/s/<publisher-hosted-page-url>. Note that in this example the target URL is probably an AMP page, but the De-AMP feature would take care of redirecting the publisher-hosted AMP page to the canonical URL.

fmarier commented 2 years ago

Thanks for the example @ShivanKaul . That would be another one for https://github.com/brave/brave-browser/issues/22907 since that AMP URL doesn't have a query string.

fmarier commented 2 years ago

This one is tricky because it's actually the whole query string that must be extracted (no key to look for): https://ad.doubleclick.net/ddm/clk/416136127;227553433;l?https://www.aircanada.com/ca/fr/aco/home/plan/baggage/carry-on.html?acid=em%7C284826%7C353418

antonok-edm commented 1 year ago

One more: https://github.com/brave/adblock-lists/issues/1025