WICG / floc

This proposal has been replaced by the Topics API.
https://github.com/patcg-individual-drafts/topics
Other
934 stars 90 forks source link

Defining the level playing field | Google as a third party #18

Open achimschloss opened 4 years ago

achimschloss commented 4 years ago

While we had a lot of lively discussions around cohort mechanism, gatekeeper(s) or the lack of them and technical/theoretical aspects of these APIs in terms of privacy, it seems about time to start a conversation of the legal / user facing aspects of these and ultimately how the level playing filed should look like. This is a bit of a longer post to have an initial framing.

It is certainly not the most beloved topic for engineers (meaning the formalities), but with the limited time available to me there is an urgency to also prototype these aspects with the first APIs. With https://github.com/WICG/WebID we are already in rather in depth discussion for a couple of weeks around this, but I would argue that we should also start with the advertising related APIs now as we have enough information about one of them at least for one.

With FloCs:

  1. being the most simple API from a functionality perspective
  2. having prototype implementations making its way into the chromium sourcecode https://chromium-review.googlesource.com/q/FloC

it would server well to discuss these topics as the first example... I'm referring to the GDPR in the following, given we can agree I guess it's the most advanced regulation in that regard where we have a lot of policy experience in the market.

Why is the legal framing important?

The FloC mechanism works by

  1. Calculating cohorts out of the users browsing history, that means personal user data which makes it subject to the GDPR - i.e. one needs to think about the purposes and extent of that processing, the legal ground for this processing (as one can see in the chromium commits) and ultimately who is formally controlling and responsible for it.
  2. These cohorts are used to address users with interest-based advertising and are therefore shared between parties (for example with an advertiser) which have different relations and knowledge about that user (i.e there is a relation to other personal data processing in addition).

Given that

  1. The FloC Function is not a self-contained browser function like a password safe where one would argue that it's just a product feature that is used by the user on its own behalf and benefit, but it enables personal data processing beyond that.
  2. From a user's/DPAs perspective the responsible party will be the Publisher were FloC based Ads are shown and browsing history is collected, not the Browser and nor the Advertiser.
  3. Publishers will need answers to these questions. Even if FloC-IDs might theoretically be anonymous by themself, that does not change anything on this observation as it relates to the full extent of processing and leads to the display of an interest-based ads to the user

The level playing field

Unrelated to the legal framing for the processing publishers have a reasonable demand to have 100% clarity how these APIs are and will be entangled with other Google Services, with examples like the iOS 14 changes coming up and the ongoing anti-trust investigations around the globe on bundling services. For now, we do have a high level alignment to establish a level playing field, with FloCs we can really define it now.

Looking at the commits for the prototype it looks like:

  1. For now, the user control and legal grounds are bound to Google services and privacy policies. Practically its fully bound to Google Services for the PoC. Which makes sense to me given one could not even run the PoC with real users, but naturally raises concerns if it will stay that way or be removed down the road.

    "Queries google to find out if user has enabled 'web and app " "activity' and 'ad personalization', and if the account type is "NOT a child account.'

  2. Secondly it seems that FloC-IDs are also synchronised to Google Backend Services, which again seems tangible for a PoC, but raises similar concerns again.

    It's a service that is supposed to (as some functions are incomplete) regularly compute the floc id by sim hashing the navigation history and log it to chrome sync

Looking Forward:

Once the FloC API should be used to actually address user with personalised ads, one needs to answer these questions at least:

My suggestion would be to also prototype these questions unrelated to the engineering aspects to also get publishers and advertiser more engaged and comfortable with these APIs and the general process.

michaelkleber commented 4 years ago

Hi Achim,

I'm happy to try to answer your questions. But I'll note that I'm an engineer, not a lawyer. So in my engineer sort of way, my response will be to make as clear as possible what happens: who is responsible for doing each thing, what the intended purpose is, and who learns what information.

I like breaking out the different stages in the FLoC mechanism, but let me split things up even finer than the cohort calculating-vs-using division that you mentioned.

  1. Cohort calculation begins with an on-device step where a web browser is taking data that it already has (e.g. a browsing history) and performing a sort of anonymization calculation. \ \ The browser's goal here is to transform some personal, potentially-sensitive data into an attribute that's common to a large group of people.

  2. Some privacy properties of the flock assignments might be hard to establish entirely on-device. For example, in the original explainer we said "The browser ensures that flocks are well distributed, so that each flock represents thousands of people." That could be done using some kind of multi-browser aggregation service in which the flock values themselves are entirely anonymous. \ \ That aggregation service doesn't exist yet, so as you noted, for our proof-of-concept Chrome will rely on Google services and privacy policies for this.

  3. A particular domain's server indicates that it wishes to receive flocks from the browser, by sending an Accept-CH = Sec-CH-Flock header. \ \ The flock itself is designed to be a piece of anonymous data, but the server is asking for it to be attached to future HTTP requests to that domain, and only the party making the request knows how it intends to use that information. So that party will need to do its own analysis of any legal requirements for collecting and using the flock in combination with other data it receives.

  4. The browser may send the flock to the server in a Sec-CH-Flock header on future requests. \ \ This would of course be governed by whatever permissions and UI controls the browser has for the feature.

  5. The server that requested the flock decides whether to use the value when it performs ad targeting. \ \ Since the flock came attached to an ad request, there is plenty of opportunity for the publisher to pass along any relevant information about permissions, controls, etc.

  6. Specifically for Chrome users who choose to sync their browsing activity to a Google account, the flock (which is a function of that browsing history) may be synced as well.

I hope this makes clear who the parties are and what roles they all play.

Regarding UI components, certainly the browser will need to include a control and information about FLoC; each browser that implements it will need to make their own decision on the details. Similarly, any consent management system will need to make some new decisions, about UI and what questions to even ask.

darobin commented 4 years ago

Thanks for starting this discussion @asr-enid. One thing I'm not certain of is whether we disagree on part of your framing or if we are looking at different aspects.

You indicate that publishers would be the responsible party. I believe that is only true in the case in which the publisher is actually processing FLoC data — so presumably they've asked for it and the browser has accepted it. But that part does not seem particularly different from the publisher requesting specific personal data and processing it for advertising purposes. No?

However, while the browser is using publisher content and user behaviour in order to establish cohorts the publisher cannot be the responsible party. Clearly for that processing the browser would have to be the data controller. Now given how surprising the processing is (as well as novel, and potentially risky), I don't believe that any legal basis other than consent could apply here. And this consent can't be bundled into any previously obtained consent (even assuming it to be valid) anyway so the browser will have to have shown some sort of dialog to consent users into FLoC processing, at least in Europe. Is this what you had in mind when you mentioned the TCF? Because, if so, I don't think that the TCF needs to be involved for this part, no? It would be unlawful for the browser to even profile users into cohorts without specific informed consent, even if they don't share the information (and there aren't many browsers, so accountability is easier) so if the you're receiving FLoC data at all then the user has to have consented.

It would be useful to reinforce that by clearly documenting in the standard that browsers are the data controller for FLoC data and therefore assume responsibility for the lawfulness of their processing. (This can be written in a legislation-agnostic manner.) This would avoid the complex mechanics of having to assert that downstream.

For the latter part, I believe that the publisher may only be (partly) responsible if there is a way for publishers to prevent their content from being used in FLoC at all. That would be a good addition (on other grounds), in which case it will be useful to see to what extent the publisher has a responsibility to the user for this.

achimschloss commented 4 years ago

Thanks for following up here, I acutally had missed to dig deeper prior to starting this. To your points:

You indicate that publishers would be the responsible party. I believe that is only true in the case in which the publisher is actually processing FLoC data — so presumably they've asked for it and the browser has accepted it

Agreed in general - the publisher does not even have access to the full dataset (browsing history), but in terms of a users perception, a personalized ad would be shown at a publishers site and the publisher would at least facilitate here and also allow a FloC based ad to be shown. This might lead to a discussion around joint controllership, but that would require to know how the exact end-to-end setup looks like.

However, while the browser is using publisher content and user behaviour in order to establish cohorts the publisher cannot be the responsible party. Clearly for that processing the browser would have to be the data controller. Now given how surprising the processing is (as well as novel, and potentially risky), I don't believe that any legal basis other than consent could apply here.

Agreed, as I noted this is a function that goes way beyond to what a user would expect a browser to do so I don't see a point in arguing it is part of the general service agreement, or even simply a tool that assists a user (like a password safe). Given it is processing for personalization which is quite extensive, consent seems the applicable legal basis

Is this what you had in mind when you mentioned the TCF? Because, if so, I don't think that the TCF needs to be involved for this part, no? It would be unlawful for the browser to even profile users into cohorts without specific informed consent, even if they don't share the information (and there aren't many browsers, so accountability is easier) so if the you're receiving FLoC data at all then the user has to have consented.

Depends I guess

It would be useful to reinforce that by clearly documenting in the standard that browsers are the data controller for FLoC data and therefore assume responsibility for the lawfulness of their processing

Absolutely that was my main intention here, also to get more clarity on the relation to other Google services

darobin commented 4 years ago

I think we're aligned on the broad lines, particularly on the necessity for the draft to include a discussion about who is responsible for which processing. Very quick notes:

achimschloss commented 4 years ago

TCF technical documents tend to be underspecified compared with the level of precision expected in Web standards, I'm not sure how to specify accessing __tcfapi in a way that would be reliable at the tech level

I guess the documentation could be more explicit in a lot of regards, having that said TCF is just a generic framework in that sense and would not describe the use for an explicit use-case like FloC directly. When we talk about an explicit interpretation and use by a potential controller that would anyway lead to an in depth guidance how the controller expects the CMP to be setup - see here for Google Advertising and Google Analytics:

Publisher Guidance for Google Ad Products Publisher Guidance for Google Analytics

The open question before even looking into this is anyway what the position w.r.t. to a potential Chrome implementation of FloC would be from Google (not a tech problem)

Given that the data can be provided to anyone, I don't see how this could be considered part of "Google Advertising"

Agreed, it is tricky given we have a bit of a unusual situation where

  1. a Cohort is calculated on personal data - with the above considerations of how this would be framed from a privacy perspective - The calculation of the cohort is a separete concern from its actual use most probably
  2. That Cohort information is leveraged for a variety of other processing purposes, in that sense it would need to be disclosed properly while establishing transparency or consent for that processing (depending on what we are talking about). Just sending it everywhere without that does not seem appropriate.