WICG / first-party-sets

https://wicg.github.io/first-party-sets/
281 stars 70 forks source link

Why depend on a 3rd party that a domain owner has no control over? #180

Open dopry opened 11 months ago

dopry commented 11 months ago

As a web developer and sysadmin who is considering this functionality. I would like to be able to from any domain I control, declare my first party set and set the caching and expiration rules. I don't see the point in a central repository controlled by a 3rd party and infrequently updated. The web is a distributed platform. This kind of centralization seems like a huge step in the wrong direction vs more traditional approaches such as headers issued from a webserver and verified by an ssl certificate.

@dopry Thanks for the feedback. Indeed, using a central repository was not our first attempt at designing this. See "Signed Assertion and Set Discovery instead of static lists", and "Using EV Certificate information for dynamic verification of sets" for details on why we settled on the current static list approach. There is also an idea for an auto-compiled list being proposed in #128 which we are keeping in consideration for future development, but I'm not sure if that would address your concerns.

Note: What we explored was having the set primary and members serve a manifest file listing either (a) the sites within a set/group, in case of the primary domain; or (b) simply point to the primary, in case of a member domain. This is more ergonomic and efficient than headers, since:

If you have any additional feedback on this topic, I would recommend opening a new issue, so it's not lost in this very narrowly focused PR (which is likely to get closed at some point).

Originally posted by @krgovind in https://github.com/WICG/first-party-sets/issues/122#issuecomment-1672034848

dopry commented 11 months ago

We expect set membership to be relatively static, and not changing within the context of the HTTP response. It's important that sets not be personalized per user, and be easily discoverable/observable for accountability, since privacy outcomes are tied to set membership.

Why must is not be personalized? I've worked in environments where users with different roles should be operating in environments with differing levels of trust and different partners. It sounds like there is an assumption that the domain is the zone of trust and not the union of the domain and user. Why shouldn't a domain operator be able to designate the boundaries of the zone of trust based on their own needs? It sounds like you are proscribing a modality that may fall short of many people's needs and forces a dependency on a third party they're forced to trust that they might not. Again the direction this project is headed seems to go against the distributed and autonomous architecture of the web.

The size of a set can be large, so it doesn't seem prudent to specify it in a response header. Even though we enforce numeric limits on the "associated subset", there is currently no numeric limit on the ccTLD and service subset. Additionally, any limits can only be enforced at the time that browser consumes the list, but doesn't prevent a site from sending a very large list in an HTTP header.

How big are we talking? I have some pretty expansive CSP headers.. it doesn't seem to be an issue there. Why would it be such an issue with a first part set? If a domain has a large list that sounds like their choice and something under their control from an architecture perspective. There is no reason it needs to be sent with all requests. It could be limited to an options response to a given url.

Ultimately you still haven't addressed the key question. When the web, aside from it's practical if not actual dependency on DNS, is designed around a distributed autonomous architecture, why are you even considering a centralized approach as an acceptable design? It's counter to the architecture of the web and gives the owners of the aforementioned Github repository unconstrained authority over a massive number of 3rd parties. How do you expect that organization to manage something effectively on the scale of DNS without a similarly distributed architecture and strict well vetted controls over it's authority like IANA?

There are around 350 million registered domains. How do you expect a GH repository manned by a handful of people to handle that volume?

brandon-dev-aleriola commented 7 months ago

What about intranets? This seems like a power grab. This doesn't solve the problem with subdomains. Solving the subdomain issue could be simple with enough will power. Something like using a new tag or modifying existing header tags like <link> or <meta>. Something like the following:

subdomain solution

Using a tagged-based implementation as well as requiring the base tag, and a requirement for either a wildcard SSL certificate or multi-domain/san certificate might be a solution. The browser can validate if the subdomains are part of the same origin.

This whole JSON on GitHub nonsense is a smell. Who is the final approver of my first party set? Why isn't it me the author of the document? What limit is placed on the set authority? What if a domain is a competing ad platform; will the domain just be automatically denied, or do they need to pay an ad mafia fee? They whole "explainer" seems like a BS excuse to deny whomever, whenever for whatever.

dmarti commented 7 months ago

The other side of this issue is that this proposal now relies on people using their public GitHub profile to review and comment on proposed sets. (The original version had an Independent Enforcement Entity, which would have been an organization with the skills and resources to review sets.)

Substantial anti-abuse and anti-harassment protections would need to be in place in order to make a GitHub user feel safe in participating in review of questionable sets: https://github.com/WICG/first-party-sets/issues/105