WICG / first-party-sets

https://wicg.github.io/first-party-sets/
292 stars 76 forks source link

Preference sharing sets #111

Open dmarti opened 2 years ago

dmarti commented 2 years ago

Another possible set could be a preference sharing set, similar to an affiliated set, but restricted to sharing a small amount of data.

First-party sets seem to be appropriate when the user is more likely to complain about domains failing to share information than to complain about domains sharing that information.

One area where FPS could help sites act in a way that meets user expectations would be in making preferences act as expected across sites. If a user visits comicBookPublisher.example and sets some option such as "do not sell my personal information" they would be likely to expect the setting to also be in effect on the newMovieBasedOnAComicBookCharacter.example domain, because the two sites have the same well-known characters and other trademarks, and are seen as the same "party."

Some sites also allow the user to decline substance or gambling ads. A user would be likely to expect these preferences to take effect across a set of domains that they perceive as a single "party."

A preference sharing set would be allowed to share a small number of preference bits (4-5 bits?) across domains that are clearly understood as the same party by users. The amount of shared data would have to be small enough not to work as a unique identifier. Set operators would probably not be able to share all possible preferences, but could meet the expectations of most of their users who choose the most common ones.

(I don't see this as something that would be widely adopted by large numbers of co-owned domains on their own initiative -- more as a solution that site maintainers could apply in case users ask why their preferences didn't take effect where expected.)

cc @johannhof

michael-oneill commented 2 years ago

This would be a very useful facility to improve the user experience of consent aquisitionm which would require a very small number of bits. Limiting the entropy to a few bits and maybe also the maximum expiry time for similar use-cases would be a good option for sets that are validated automatically, without necessarily requiring the public submission or institution managed enforcement method.

johannhof commented 2 years ago

Thanks for filing this, Don! I think we should discuss the utility and privacy properties of this more but I personally think it's an interesting idea where the ability to limit this to a known group of sites through FPS (and thus ensuring the low entropy limit) is really useful.

Because of the controlled entropy this would probably have to be exposed through a new dedicated web API of sorts, which complicates the whole idea a little bit in terms of execution :)

krgovind commented 2 years ago

Interesting idea! If I remember correctly, @eligrey has asked about such a capability in the past during PrivacyCG discussions.

dmarti commented 1 year ago

Would it have to be a separate API, or could we say that the cookie value must be a single character from the set used in base32 encoding? https://www.rfc-editor.org/rfc/rfc4648#page-10

eligrey commented 1 year ago

I do not believe there is any actual privacy benefit in sharing anything other than the Do Not Track ('opt me out of all tracking') & Global Privacy Control ('opt me out of the sale/sharing of my info') privacy signals to confer user tracking preferences.

I do think that there is other utility here though for first-party applications that simply want First-Party-Set-partitioned localStorage / indexedDB for frameworks and libraries to share state across FPS members entirely on-device without the need for network cookies.

dmarti commented 1 year ago

Global Privacy Control (GPC) is already applied across all sites, whether or not they share set membership. The problem is what happens when a user takes the time to make a preference change, such as

MatiasLFranco commented 1 year ago

+1 to Chrome creating a new subset that allows a small number of bytes to be shared with no limit of domains within a First-Party Set. In Agea, for example, this could be used to share the user’s preferences across domains.

cfredric commented 1 year ago

Hi Don (et al.),

I've been thinking about this, and I think there are some challenges that make it difficult/impossible to do this in a privacy-preserving way. Before I go into that, my assumptions are the following:

If those assumptions are accurate, then the problem is the immediate propagation of the data to other set members. Even if only one bit is exposed, a site could easily change that bit over time and use it to convey a user identifer, given that there's no rate-limiting or random noise in that signal. So even one bit is enough to re-enable cross-site user tracking within the set - and therefore this new subset would provide a loophole to work around the numeric limit of the associated subset, and wouldn't actually be limited in any meaningful way.

There are some other Privacy Sandbox projects that might also be applicable here, depending on the exact use case. For example, Shared Storage could be used to address the "do not show me gambling/nsfw ads" use case. But those have strict limitations as well (precisely to address privacy concerns), which probably make them unsuited as a general-purpose solution here.

Given these privacy concerns, I think we will hold off working on this until/unless we can find a way to address them.

dmarti commented 1 year ago

@cfredric It really comes back to what First-Party Sets are really for. If there's a reason to have them at all, it's to make sure that sites that the user sees as the same "party" are able to act in a way that meets the user's expectations, including privacy expectations. "Cross-site user tracking" within a First-Party Set is probably a privacy win -- or the domains wouldn't be a valid First-Party Set.

You can try a little experiment.

Both whitecastle.com and whitecastlevegas.com are valid domains for the "party" White Castle (even though they are separate businesses at different domains). Without preferences sharing, the user ends up with more privacy risk from not having their privacy settings carry over from one domain in the "party" as expected.

(This is separate from the issue of numeric limits on associated domains. There are a lot of scams that could be pulled with small sets, especially when home page and landing pages are branded differently -- https://github.com/WICG/first-party-sets/issues/93#issuecomment-1202941930 -- so it would be a big problem to treat small sets as safe just because they're small.)

cfredric commented 1 year ago

Thanks for that example. I understand the desire to sync a user's privacy settings (or other settings) across similarly-branded sites.

However, what I'm trying to illustrate is that the requirements of that use case make it indistinguishable from allowing unlimited bits to be shared between the sites. And therefore it cannot be supported without creating a bypass for the limits/restrictions imposed by the associated subset and service subset.

dmarti commented 1 year ago

If a site is changing the identifier on its own initiative, for tracking (and not as the result of a preference change by a user) that seems like a good reason to revoke the set, and disallow that site from joining other sets.

nouchy commented 6 months ago

Hello everyone, I would like to bring up this topic, and submit a unique use case. As a consent management platform, we have invested heavily in the issue of consent fatigue. Or in other words, how to limit the frequency of display of choice windows. The question will remain easy to manage for a given site, but will encounter problems for sites belonging to the same entity, and even more so for sites that have no connection between them but are united in a cooperative. Today we operate a cooperative of around 5,000 sites, which share the signals of choice. Both consent and refusal to consent.

The system is currently operated via the IAB Europe TCF (and will migrate to the global privacy platform). That is to say that the choice is currently encoded in the TC String which is stored in a third-party euconsent-v2 cookie on the consentframework.com domain. The site https://www.consentframework.com/ allows information and control to be centralized, and the choice/withdrawal of consent can also be carried out on any participating site.

Since the third-party cookie will disappear, this cooperative is threatened. What is planned for the moment is to operate a site=>cooperative=>site redirection each time the user makes a choice, but this degrades the user experience. Our surveys show that users prefer this redirection to seeing the window displayed on all sites, so we will implement it, but it seems to us that this use case should be taken into account. I would like to point out that there is no tracking or tracker (id) associated with this service, the sole purpose of which is to share choice.

johannhof commented 6 months ago

Hi @nouchy, how are you hoping that RWS can help with this? It doesn't seem like there's any way those 5000 sites could be in a joint RWS, even if they were somehow related, given the limit on "associated" domains.

nouchy commented 5 months ago

Hi @nouchy, how are you hoping that RWS can help with this? It doesn't seem like there's any way those 5000 sites could be in a joint RWS, even if they were somehow related, given the limit on "associated" domains.

Hey @johannhof, thank you very much for your help ! If RWS can't help us with this, what Sandbox API do you think would allow us to do this? There must necessarily be a solution to preserve this legitimate use case, allowing to limit consent fatigue.

dmarti commented 5 months ago

The kind of repeated dialogs that cause "consent fatigue" face the risk of not being able to get actual consent. Consent must be freely given, specific, informed and unambiguous -- not just, we got the user to develop the habit of clicking the green button to make the dialog go away without reading it.

W3C Privacy Principles:

An actor should not prompt a person for consent if the person is unlikely to have sufficient information to make an informed decision to consent or not. In considering whether or not a person is sufficiently informed to be asked for consent, actors should be realistic in assessing how much time and effort would be required to understand the processing for which they are asking for consent.

Making people click the same button over and over on different sites before even reading any of the site content is not really helpful. It's all the fatigue with none of the consent.

It seems like what would help with the use case that @nouchy describes is being able to get real, informed consent once instead of dubious consent once per site. If a site's membership in the cooperative is understandable to the user, and the experience of giving consent for the cooperative is designed to get real consent, then this could be a strong use case for RWS. The preferences sharing set language would have to be adapted to require that the preference is only set in a context where its effects have been clearly explained to the user. (For a simple preference like light mode/dark mode, that could be done with simple images, for something more involved like consent it would be different.)

dmarti commented 5 months ago

@johannhof a preferences sharing set would be less limited in site count and more limited in amount of information shared, than an associated set.

The other "Privacy Sandbox" proposals are already using attestations to limit problematic uses -- a preferences sharing set could use an attestation stating that the site commits to use it only for cross-site sharing of a preference explicitly set by the user.

nouchy commented 5 months ago

Thanks @dmarti ! Indeed, @johannhof in this use case there should be no limit in the number of sites involved, but each site should offer the possibility of revisiting one's choices on each page, for example via a clickable button at the bottom of the page. In terms of data limitation, it would be entirely possible to rely solely on the GPP String and/or the TC String of the IAB Tech Lab which, by design, do not allow inter-sites tracking.