GoogleChrome / ip-protection

Apache License 2.0
156 stars 20 forks source link

Clarification on first and third party definitions #50

Open skuurtje opened 2 months ago

skuurtje commented 2 months ago

Hi!

I have been following the privacy sandbox project closely and what it will imply in terms of cross site tracking prevention. I have get a bit confused in this whole process in regards to which terms have which definition, and I was hoping to get some clarification on this as its relevant for both the privacy sandbox and this project. Let me explain:

First and third party cookies

As stated in the privacy sandbox documentation Cookies set by the site you visit—the one shown in the URL bar—are first-party cookies. A site you visit can embed content from other sites, for example, images, ads, and text. Cookies coming from sites other than the current site are third-party cookies.. Which refers to the sameSite attribute that can be set which with the value of none allows you to share cookies cross domain. Also in the web.dev docs we see that If the cookie's registrable domain and scheme match the current top-level page, that is, what's displayed in the browser's address bar, the cookie is considered to be from the same site as the page and it's generally referred to as a first-party cookie..

This to me implies that a cookie is first party if it's not accessible outside the top level page, regardless if a tool like google analytics/GA4 js code (a different domain) was loaded into that website and that js bundle's code was responsible for setting this cookie. Is this correct?

First and third party tracking

I could not really find somewhere where google defines this themselves, on a google support page related to measuring youtube ads with tracking pixels, here we see Google does not accept third-party tracking pixels for YouTube measurement.. So with this can I then assume that google sees tracking pixels not owned by the domain its running on as 3rd party tracking?

First and third party data

On the google ads website I see the following definition for first party data: First-party data is information customers have consented to provide, like an email address or phone number that your business directly collects and owns..

Then in the google support page regarding GA4, the article explains how you can enrich your GA4 data with first party data.

So if a customer consents to being tracked (like consent mode v2), with a tracking tool that never shares any customers data cross domain, will this then be first party data? For example an analytics tool where you can see you website data and that data is never shared with other customers of the analytics tool.

Also does this imply that the data GA4 collects running on a domain is first or third party data?

First and third party traffic

Alright now to the latest term related to first and third party, I see in a comment to another issue that

IP Protection will only proxy some third-party traffic. If the requests that you use to collect this data are first-party, they will not be proxied. Note that first-party is defined as whether the request domain matches the domain in the URL bar of the browser.

So just to be 100% clear here: GA4 tracking running on a non google website, then the data send to google's domain will then also be counted as third party traffic and possible be routed through those proxies?

Website analytics tracking with no data being shared

The readme of this proposal states that:

Chrome wants to focus on behaviors that are most likely to be using IP addresses for tracking users across sites in ways that might not align with user expectations of privacy.

In the following situation:

So in this situation the IP adreses are not used for tracking users across websites and the users need to have consented to being tracked, would this tracking solution still be eligible to be added to the list and be proxied or not?

Thanks in advance for helping clarifying these terms, as this would probably help me and a lot of people confused by this a lot!

DavidSchinazi commented 2 months ago

This to me implies that a cookie is first party if it's not accessible outside the top level page, regardless if a tool like google analytics/GA4 js code (a different domain) was loaded into that website and that js bundle's code was responsible for setting this cookie. Is this correct?

Cookies are tied to the domain that set them. If that domain is different from the top-level domain, then the cookie is considered third-party.

So with this can I then assume that google sees tracking pixels not owned by the domain its running on as 3rd party tracking?

Similarly, third-party indicates that the domain is different from the top-level domain.

Also does this imply that the data GA4 collects running on a domain is first or third party data?

IP Protection doesn't have a concept of first-party vs third-party data. IP Protection operates at the HTTP layer, so it differentiates between first-party traffic and third-party traffic.

So just to be 100% clear here: GA4 tracking running on a non google website, then the data send to google's domain will then also be counted as third party traffic and possible be routed through those proxies?

In that scenario, the GA4 domain does not match the top-level domain, so these requests to GA4 are considered third-party. That makes them eligible for proxying.

So in this situation the IP adreses are not used for tracking users across websites and the users need to have consented to being tracked, would this tracking solution still be eligible to be added to the list and be proxied or not?

We're still finalizing the details of how we'll be building the list of proxied third-party domains. We'll update this repository once we have this ready.