brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.56k stars 2.28k forks source link

Group related domains together in before spoofing 3rd-party referrers #3194

Closed fmarier closed 3 years ago

fmarier commented 5 years ago

In order to fix a whole class of webcompat issues (e.g. #1356), we could group related domains together when we decide whether or not a request is 3rd-party.

For example, google.com and googleapis.com both belong to Google. We're not really "leaking" the referrer if Google can see both requests anyways by virtue of being the recipient in both cases.

Mozilla uses a list of entities for a similar purpose in their tracking protection. We could probably reuse that list.

@diracdeltas also suggested using it for some other 3rd-party checks in Shields.

diracdeltas commented 5 years ago

currently these are the shields features which have some awareness of 3rd-partiness:

I think for all of these it would make sense to change "1st party" to "either 1st party or in the same entity group as the top-level page"

tildelowengrimm commented 5 years ago

Would we rely on Mozilla to maintain those groups, or would we want to depend on same-site or something?

fmarier commented 5 years ago

I'm not sure what "same-site" you're referring to, but last time I checked there was no way to programmatically determine this in a safe way. That might be something that the research team could investigate.

I was thinking that we could start by using the same list as Mozilla, potentially adding to it since we block more than just the Disconnect list.

tildelowengrimm commented 5 years ago

Maintaining it manually ourselves with help from Mozilla seems like a workable plan, though it may be a bit of work to keep up-to-date.

fmarier commented 5 years ago

Since I wrote a script to parse our .dat files as part of testing https://github.com/brave/tracking-protection/pull/28, I decided to write a parser for the Mozilla entity list and compare it with the one we already generate from the Disconnect blacklist:

That's almost double the number of entries. Looking at the diff though, it's not just extra entries, there are also properties that are missing from the Mozilla entity list:

and some that are incomplete:

-yahoo.com: address.yahoo.com,adinterax.com,adrevolver.com,adserver.yahoo.com,advertising.yahoo.com,alerts.yahoo.com,analytics.yahoo.com,avatars.yahoo.com,bluelithium.com,buzz.yahoo.com,calendar.yahoo.com,dapper.net,edit.yahoo.com,interclick.com,legalredirect.yahoo.com,login.yahoo.com,mail.yahoo.com,marketingsolutions.yahoo.com,my.yahoo.com,mybloglog.com,notepad.yahoo.com,overture.com,pulse.yahoo.com,rightmedia.com,rmxads.com,rocketmail.com,secure-adserver.com,thewheelof.com,webmessenger.yahoo.com,yieldmanager.com,yieldmanager.net,yldmgrimg.net,ymail.com
+yahoo.com: adinterax.com,adrevolver.com,bluelithium.com,dapper.net,flickr.com,flurry.com,interclick.com,luminate.com,mybloglog.com,overture.com,pixazza.com,rightmedia.com,rmxads.com,rocketmail.com,secure-adserver.com,staticflickr.com,tumblr.com,yahoo.co.jp,yahoo.com,yahooapis.com,yahooapis.jp,yahoofs.com,yieldmanager.com,yieldmanager.net,yimg.com,yimg.jp,yldmgrimg.net,ymail.com,yuilibrary.com,zenfs.com
-yandex.com: adfox.yandex.ru,an.yandex.ru,awaps.yandex.ru,mc.yandex.ru,moikrug.ru,web-visor.com,yandex.ru/clck/click,yandex.ru/clck/counter,yandex.ru/cycounter,yandex.ru/portal/set/any,yandex.ru/set/s/rsya-tag-users/data
+yandex.com: api-maps.yandex.ru,moikrug.ru,web-visor.com,yandex.by,yandex.com,yandex.com.tr,yandex.ru,yandex.st,yandex.ua

Looking at the Yahoo! ones, here are the resources that are present in the .dat file but not in the Mozilla entity list:

Looking at one of these, analytics.yahoo.com, it was part of the initial upload of the Disconnect list back in 2015 (Disconnect, Mozilla) but it's not clear to me why that's not part of the entity list. That same commit also added adserver.yahoo.com, also missing from the Mozilla entity list.

Looking at the properties missing from the Mozilla list, I found that ybrantdigital.com is still a tracker in the latest version of the Disconnect list and has been there since the original upload (Disconnect, Mozilla, but it got removed from the properties section in 2017 in this pull request without a comment as to why that is.

wiredminds.com is similarly missing from the Mozilla list but present in the .dat file, but in this case, it has never actually been listed as a property in the Mozilla entity list, probably because, while it's down at the moment, it used to redirect to wiredminds.de.

Bottom line is that while there are differences between our entity list and Mozilla's, some of which make sense and some of which are harder to explain or possibly mistakes, my guess is that we would be better off using their list since it covers a lot more web properties. We could suggest fixes to them if we notice missing entries with a webcompat impact.

pes10k commented 3 years ago

Closing bc we no longer use any such determinations or lists when deciding referrer policy https://github.com/brave/brave-browser/issues/10825