brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.75k stars 2.32k forks source link

Please consider porting functionality from the ClearURLs extension to Brave Shields. #11250

Closed Peacock365 closed 4 years ago

Peacock365 commented 4 years ago

From your blog article "What’s Brave Done For My Privacy Lately 5: Grab Bag" I found out that Brave is now filtering tracking parameters from URLs. This is fantastic news, but I do worry a bit about your rules being incomplete / too few. There are extensions which achieve similar functionality, and ClearURLs...

https://chrome.google.com/webstore/detail/clearurls/lckanjgmijmafbedllaakclkaicjfmnk

...seems to have the most complete ruleset of them all, 250+ rules in total. I kindly request from you that you bring your own rules on par with the rulesets supported by this extension:

https://github.com/KevinRoebert/ClearUrls/tree/master/data

Seems to be more extensive than what Brave does currently. There is also another functionality I would like to see ported: ClearURLs adds an entry to the context menu that basically says "Copy clean link" instead of just "Copy link", i.e. it performs its filtering for copied links upon request as well. I would like to see such an entry in Brave, as well.

It also prevents tracking via the History API via the replaceState() method, as described here:

https://developer.mozilla.org/en-US/docs/Web/API/History_API#The_replaceState()_method

Here are related commits:

It also removes Google redirection:

https://github.com/KevinRoebert/ClearUrls/blob/master/core_js/google_link_fix.js

Please consider replicating these features in Brave, I think they are fairly essential and would enhance Brave Shields. I'd also suggest creating UI entries in the Brave Shields pop-up, next to "HTTPS upgrades" and "Ads and trackers blocked" something like "Tracking elements filtered from URLs" or similar, and make this adjustable in the settings. Could also be a sub-entry under ad & tracker blocking.

Please kindly review this request, thank you for your attention.

Peacock365 commented 4 years ago

@pes10k @fmarier

pes10k commented 4 years ago

Hi @Peacock365 , thank you for the suggestion. This is a neat extension, but it targets something a bit different than what Brave is aiming to support out of the box. The extension you point to tries to remove as much information as possible from ULRs, while Brave is only aiming to remove privacy harming (i.e. user-identifying, user-specific) parameters.

If there was a way identifying only the values in this list that were specific to individuals, that would be extremely valuable to us, and I expect something we'd be interested in pulling from!

thanks again!

Peacock365 commented 4 years ago

I see, so you are only aiming to blocking parts of the URL when e.g. a randomly generated ID is assigned per user? This is a good idea, and I agree with this, but I wouldn't stop there. For example, there are tracking parameters representing a certain campaign of e.g. an ad company, these do not contain any user-specific parts, but are still assigned in batches to users. They are unnecessary of course, there is no harm in removing them.

All I can say is that the ClearURLs extension does not break any website for me, one website where it goes to crazy high numbers is YouTube. No performance impact noticeable, though.

Also, what about the history API sniffing this extension covers? Could you consider replicating this?

Thanks for the reply and info so far.

pes10k commented 4 years ago

All I can say is that the ClearURLs extension does not break any website for me, one website where it goes to crazy high numbers is YouTube. No performance impact noticeable, though.

I appreciate this, but what works for 1 person doesn't necessarily hold for 10 million users ;) We're far more aggressive than other browsers in being willing to risk some breakage for better privacy, but not infinitely so :)

Also, what about the history API sniffing this extension covers? Could you consider replicating this?

I'm not sure I understand the benefit of doing so. Unless there is a bug, we filter out the parameters anytime a URL hits navigation / network. Can you say more about what privacy harm would be addressed here? I'm not aware of a way a site can use the History API to read cross origin browsing behavior, but if there is something i've missed i'd be very grateful for details you can share!

Peacock365 commented 4 years ago

I appreciate this, but what works for 1 person doesn't necessarily hold for 10 million users ;) We're far more aggressive than other browsers in being willing to risk some breakage for better privacy, but not infinitely so :)

With URL parameters, the only thing that matters is to identify them, cutting them out cleanly from the URL. Once the tracking parameter in its entirety is identified beyond doubt, the breakage caused should be (and actually is) minimal, bordering on non-existent. And yes, I was on tracking-heavy websites with this (anything Google, Facebook, Amazon, Twitter, eBay - you name it). But then, if the parameter being user-specific is where you draw the line, there is not that much I can do about it. Whelp, I thought I could get rid of the extension to minimize my fingerprinting surface a bit further... Still, it might present a resource to cherry-pick from.

Can you say more about what privacy harm would be addressed here? I'm not aware of a way a site can use the History API to read cross origin browsing behavior, but if there is something i've missed i'd be very grateful for details you can share!

So, in the extension description it says:

Prevents tracking injection over history API

Then, I have this:

"Function that is triggered on history changes. Injects script into page to clean links that were pushed to the history stack with the history.replaceState method. @param {state object} details The state object is a JavaScript object which is associated with the new history entry created by replaceState()"

And this:

https://developer.mozilla.org/en-US/docs/Web/API/History_API#The_replaceState()_method

So, best you look at the code the extension author uses to prevent this, what do you make of this?

pes10k commented 4 years ago

Whelp, I thought I could get rid of the extension to minimize my fingerprinting surface a bit further... Still, it might present a resource to cherry-pick from

Much appreciated :). Thats not to say this decision wont change in the future, but thats (just targeting user-specific values) where things stand now

Prevents tracking injection over history API

I understand what the extension does, and how it functions, but I don't understand the harm its preventing, assuming we're already removing the privacy-harming parameters from URLs before they hit network. I appreciate you taking the time to respond and write this up, but I still do not see how this is a way of leaking identifiers across 1p contexts in Brave.

If a site uses the history API to push some URLs that include a tracking parameter in it the history, they'll be remove the moment they hit network / the URL bar. That will prevent them from being sent (either as a referrer, or as the requested URL). If there is a way I am not seeing though that history API could be used to get these parameters into network traffic and or the top level URL though, that seems like a bug in our implementation and we'd def want to get it fixed ASAP.

Either way, thanks agian!

Peacock365 commented 4 years ago

Ah, I see, thanks for your reply and for taking the time to review this.

Semi-OT, Brave Shields are great already, ad and tracker blocking within the scope of the included lists and set to "Aggressive" just works, HTTPS Everywhere works fine. Good progress so far. But there are some things that make me want to pull my hair out, and for which I have to use extensions, although they should be included in Brave Shields IMHO:

When all those are implemented, I'll be a happy camper. But this seems to be a far way off. :(

pes10k commented 4 years ago

CDN blocking, like, locally inserting libraries that would otherwise be pulled from a server

We looked into this, but found that the libraries change very often since many are not explicitly versioned, and so it's tricky. We're still looking into this, but i dont think there is muhc chance we will implement something anytime soon.

You need some kind of heuristics, same what the Privacy Badger extension does, where it is self-learning, i.e. when it spots a tracking script a few times it auto-adds it to the blocklist.

I dont think we'll ever do this, since this becomes a tracking vector / history leak

Filtering tracking parameters out more aggressively but still reasonably, perhaps only whitelisting the parameters used by campaigns run within the Brave browser itself (e.g. the links behind your Sponsored Images).

Will just point to the above comments for this, i think i already answered up there :)

The ability to add custom filter lists easily, e.g. I always have to add the "I don't care about cookies" list to uBlock Origin, I'd like to drop this and just do that in Brave (Why isn't this list bundled anywhere, btw, the extension that uses this list seems pretty popular?)

There should be good news for you on this point very soon )

Peacock365 commented 4 years ago

We looked into this, but found that the libraries change very often since many are not explicitly versioned, and so it's tricky. We're still looking into this, but i dont think there is muhc chance we will implement something anytime soon.

Hm, I guess you'd have to include a whole range of versions from each library for optimal web compatibility, bloating Brave's size. I use LocalCDN which is a fork of Decentraleyes upon the recommendation of someone, and it just ignores the exact version of the library used by the website and inserts the most recent one (with which you'd have to keep up, though). Breakage? None, 1000+ locally inserted libraries and counting.

I dont think we'll ever do this, since this becomes a tracking vector / history leak

Wow, this triggered me so much that I removed Privacy Badger (was not that useful to begin with, since the custom filter lists I am using are extensive in nature).

Will just point to the above comments for this, i think i already answered up there :)

Let's hope for BRAVERY here, then. :-D

There should be good news for you on this point very soon )

Thumbs up!

Peacock365 commented 4 years ago

One last idea, while I don't know how your internal testing of new features works, I'd make this suggestion:

Try to replicate the implementation of LocalCDN (seems to be the most active extension dealing with CDN blocking these days) or alternatively Decentraleyes within Brave Shields, with libraries used in websites being upgraded to their most recent versions (no old versions included), run this special version of Brave on various websites, perhaps even privately or in your entire company for, say, 1 month.

Record any breakage. I am sure there won't be much breakage if any, because the CDN delivery source is not distributing old versions of libraries forever, either. Websites are coded with the possibility of the library being upgraded (a new version being auto-served one day) in mind. Just upgrading the library should only mildly raise the web compatibility risk, if at all.

Peacock365 commented 4 years ago

@pes10k

I hope my suggestion didn't doom CDN blocking integration in Brave forever. ;) I was merely suggesting upgrading libraries with minimal risk of breakage, instead of having to bundle an abundance of versions, which might pave the way for successful CDN blocking. Please do not give up too easily on this worthwhile facet of anti-tracking.

Anyway, I think we are mostly done here, I'll close this issue. Continue fighting the good fight, I started out with Brave thinking it couldn't be any worse than Chrome anyway, and ended up evangelizing it left and right. :-D I hope this project succeeds, wish you all the best! I'll be sure to report back if there is something to report.

pes10k commented 4 years ago

Thanks very much for the kind words and the suggestions @Peacock365 , very very appreciated :)