WICG / interventions

A place for browsers and web developers to collaborate on user agent interventions.
Other
177 stars 28 forks source link

Discourage excessive origin counts #50

Closed mnot closed 2 years ago

mnot commented 7 years ago

It's become common practice for sites to use a large number of origins to load a page.

The HTTP Archive, for example, shows that the top 1000 sites contact about 33 domains (not origins) for a page load on average.

In practice, this can get much higher (remember, those are averages).

This has significantly detrimental effects on end users:

Recent developments that improve Web performance (e.g., async, HTTP/2 connection coalescing, 1RTT and ORTT handshakes for TLS, TCP Fast Open and QUIC) have the unfortunate side effect of reducing the cost -- in end user perceived performance -- of adding new origins to a page load.

An intervention is needed to assure that this doesn't encourage yet more origins to be included in page loads unnecessarily.

The focus here is on assuring that adding an unreasonable number of origins to a site has some friction against it, to encourage sites not to add new origins without good reason.

Note that alternative solutions that require sites to constrain the ability of third-party sites to set / retrieve cookies, etc., won't address the concerns here; sites have little incentive to constrain third parties, since they often are trading the ability to track users for some function (e.g., site analytics).

What's important here is that browsers maintain their role as an agent of the user first, by avoiding the creation of a "race to the bottom".

Proposed Interventions

There are a number of ways an intervention could help here; I'm very open to other approaches.

The suggestions below are somewhat mix-and-match; note that N might be different for each one. N should be very high to start with (e.g., 100), and periodically reconsidered for lower values.

After N origins are contacted in the page load:

  1. Log in the console, warning the developer that future mitigations may be triggered.

  2. Disable async processing for all subsequent origins.

  3. Disable cookie processing, custom request headers and non-GET methods for all subsequent origins.

  4. Refuse to contact new origins.

Ajedi32 commented 7 years ago

I'm not sure I really understand exactly what motivation for this proposed change is.

Each origin can (and often does) store cookies for the user (and use other stateful mechanisms) enabling it to track the user's activity across any site that references it.

I don't really get this one. Don't most browsers include a built-in setting for blocking third party cookies? If tracking is the main concern here, wouldn't it be easier to just enable that setting rather than break a whole bunch of sites with an intervention like this?

As more origins are added, it's more likely to cause performance regressions due to congestion near the end user.

Wait, what? I'm calling citation needed on this one. While there may indeed be performance issues caused by sites loading large amounts of unnecessary content from third parties, that issue is completely separate from the number of origins being contacted. You'd have exactly the same problems if the same content was loaded directly from the website host's domain. (Except worse, since you'd no longer have the performance benefits of loading assets from third party CDNs.)

Additionally, prior to the release of HTTP/2, spreading content out across multiple domains was actually a recommended strategy for improving performance known as domain sharding. I imagine there are quite a few sites which still use this technique, since HTTP/2 adoption is still in its early stages, and it's not something we want to discourage.

The bottom line is that the number of origins a page contacts is an implementation detail which has no impact on user experience, and thus is not something which I believe requires an intervention.

mnot commented 7 years ago

Don't most browsers include a built-in setting for blocking third party cookies? If tracking is the main concern here, wouldn't it be easier to just enable that setting rather than break a whole bunch of sites with an intervention like this?

That doesn't address the overall damage to the ecosystem.

If the Web platform requires users to take positive steps to guard their privacy (which most don't know how to do), while other platforms have relatively higher protection "out of the box", that puts the Web at a long-term comparative disadvantage.

While there may indeed be performance issues caused by sites loading large amounts of unnecessary content from third parties, that issue is completely separate from the number of origins being contacted.

TCP congestion control works on a per-connection basis. More connections means more chance of a congestion event, because while their congestion controller are uncoordinated (and indeed can't be coordinated, because their server endpoints are diverse), the responses all slam the connection at about the same time.

spreading content out across multiple domains was actually a recommended strategy for improving performance known as domain sharding.

Domain sharding has been thoroughly debunked in the Web performance community; while there is limited value for some clients when you shard to about 2-4 hosts, because of HTTP (and TCP) HoL blocking, anything more than that is counter-productive.

The bottom line is that the number of origins a page contacts is an implementation detail which has no impact on user experience

I'd like to discuss it more before you dismiss it out of hand, please.

Ajedi32 commented 7 years ago

If the Web platform requires users to take positive steps to guard their privacy

If your concern is that third party cookies are not blocked by default in most browsers, wouldn't it be better to address that problem directly? Rather than "discourage excessive origin counts", why not propose "block third party cookies by default"?

TCP congestion control works on a per-connection basis [...] Domain sharding has been thoroughly debunked in the Web performance community

Interesting, I'd honestly never heard that before. I did find this article which explains things very well: https://insouciant.org/tech/network-congestion-and-web-browsing/

As the article points out though toward the end, there seem to be much better ways for browser vendors manage TCP congestion than trying to outright limit the total number of origins that a page can request resources from.

I'd like to discuss it more before you dismiss it out of hand, please.

Yeah, sorry for jumping to conclusions.

I still think this proposal is a bit of an XY problem though. It seems to me there are almost certainly better, more direct solutions to the problems you've described (user privacy, TCP congestion) than the solution you're proposing. Though I'm certainly open to arguments to the contrary.

johannhof commented 2 years ago

(As noted in https://github.com/WICG/interventions/pull/72, we intend to archive this repository and are thus triaging and resolving all open issues)

This proposal doesn't seem to have gone anywhere and at least the mentioned privacy issues are being solved elsewhere in more specific ways than broadly restricting origin count for websites, which is likely to result in extreme breakage.