WICG / ua-client-hints

Wouldn't it be nice if `User-Agent` was a (set of) client hints?
https://wicg.github.io/ua-client-hints/
Other
590 stars 77 forks source link

UA brand, engine, and version checks should expire after some reasonable amount of time #54

Open erik-anderson opened 4 years ago

erik-anderson commented 4 years ago

I'll caveat this issue with an explicit acknowledgement that this idea is a bit out there.

Over the years, IE and Edge have done significant evolutions of the UA string to defeat existing "this site is unsupported in your browser" UA string checks. A common theme is that we successfully change the string to map to the closest browser engine we feel we are compatible with and then added a brand-specific token for the usual use cases like market share calculations, "you logged in with Edge" notifications, etc. This has often worked really well but, as sites update, they start to add new string comparisons that start to flag us as "not the modern browser I tested for," sometimes regardless of any actual testing in ours.

If we can agree that it's unreasonable for a site to add a UA-specific check that lives for a multi-year period because they can't possibly anticipate what interop changes will come down the road that might enable those browsers to work well with the site, I'd like to explore a mechanism to ensure brand/engine checks are only valid for a limited period, with non-trivial effort being required on the part of the site developer to keep them active.

Instead of sending strings like "Chrome" or "Firefox,", what if we instead have an expiring global UA token specific to each browser? Perhaps we would want a central clearinghouse for keeping track of these tokens and avoid collisions, or maybe each browser can agree to self-publish their tokens and follow the time-limited nature.

Instead of, for example, sending a Sec-CH-UA: "Chrome"; v="73" header, we might send Sec-CH-UA-Key: [non-human-readable lookup key], [GREASE value and/or other browser’s value you want to be treated as equivalent to]”.

Chrome would likely send theirs + one or more grease values. Edge would send Chrome’s + our value + zero or more grease values.

The lookup keys would be documented to be used for browser builds produced for some specific range of dates and/or versions, e.g. January 15th through April 15th, or builds, e.g. version 81. The one(s) that will be used in the future would not be assigned or published too far ahead of time (e.g. no more than 6 months).

Sites that want to continue to have a UA check would need to ensure they are querying and saving off the currently active list of tokens for browser(s) they care about. If they do not refresh their known lookup keys, then eventually their site will start seeing tokens, even from Chrome, that they don’t understand.

For example, a central list of keys might publish a table like:

Browser Version Key (randomly generated and then assigned)
Chrome 82 4kf34jgd
Chrome 83 Ioxnm46s
Chrome 84 vbrj3dfj
Edge 82 c9dfk10a
Edge 83 akls3kd2
Edge 84 fdgjdf59
Safari 13 5jkfgp3o
Safari 14 df902sf1
Safari 15 Iwzsdj32

If a JS library decides to adds a hardcoded UA check, they can do that by matching against one or more of the currently known key values. If the site doesn’t update the library on a frequent cadence to pick up new hardcoded values, the UA check will eventually hit some unknown state.

Entering a "I don't know this UA" state may or may not break the site, but if it does, it would break in all browsers and hopefully send a strong signal that they need to do feature detection or at least keep their site’s code up-to-date as browsers evolve. Contrast that with today where sites tend to focus only on the biggest player.

The hope is that this would increase the ongoing cost for sites to keep a browser UA check working and restrict it to truly motivated sites. I would expect major sites to automate getting new values and wrap it in a library in such a way that it wouldn’t operate significantly differently than today, but other libraries and smaller sites may decide it’s not worth the trouble to have a UA check that lives forever. It would also reduce the ability for sites to do something like match against only Chrome and reject other UAs—if they’re not aware of the existence of some smaller, new browser, then that browser's token would be indecipherable from a GREASE value sent by a larger browser.

othermaciej commented 4 years ago

This proposal would also make it a fair bit more complicated to pretend to be a different browser entirely. Which will probably still be necessary, as high-resource sites are likely to jump through these token hoops and would likely have an allow list of exact known tokens.

jyasskin commented 4 years ago

Nice idea, but unfortunately, I think folks would write code to periodically fetch the central clearinghouse, parse it into a map from token to browser+version, and then keep using the bad UA filtering code they have today.

erik-anderson commented 4 years ago

This proposal would also make it a fair bit more complicated to pretend to be a different browser entirely. Which will probably still be necessary, as high-resource sites are likely to jump through these token hoops and would likely have an allow list of exact known tokens.

To make sure I understand the concern: is it that it would be harder because you, the browser wanting to spoof another browser, would need to know the current set of tokens for the browser you want to spoof?

Nice idea, but unfortunately, I think folks would write code to periodically fetch the central clearinghouse, parse it into a map from token to browser+version, and then keep using the bad UA filtering code they have today.

Yes, a non-trivial set of sites will absolutely do that. I'm not sure if it's a non-zero value though, since: 1) Hardcoded checks that don't move to a UA key decoder service would have a limited shelf life. I'll admit I don't know how much this will actually move the needle. 2) For larger sites in particular, they sometimes provide a catch-22 scenario to the browser vendor of "we won't allow users to access our site in your browser until you have sufficient market share based on stats we collect" and, "if you lie about what UA you are, you will always appear to have zero market share." A browser developer could choose to send a different, site-specific UA key token for just that site and then ask the site to review their logs/stats to prove that they actually have enough market share that they should care about them. 3) The hypothetical central clearinghouse could potentially offer an option for browser vendors to choose to not publish to the central database. A browser like Vivaldi could opt out of publishing and make it more difficult for even a major site to keep up with their tokens. It would still allow trusted partner sites known to not target their browser in negative ways to get the tokens via some more direct data source provided by the browser vendor.

amtunlimited commented 4 years ago

Nice idea, but unfortunately, I think folks would write code to periodically fetch the central clearinghouse, parse it into a map from token to browser+version, and then keep using the bad UA filtering code they have today.

I think what would happen is similar to the "device detection" APIs that are currently available today. Someone would have the idea to maintain a database of these values, offer up an API to translate key to browser+version, and suddenly there's a third party collecting a bunch of fingerprinting info tied to a bunch of first parties bypassing any third party blocks on the data in place.

yoavweiss commented 4 years ago

Thanks for the proposal! It's definitely an interesting thought experiment.

At the same time, I think it would be make a few use-cases harder: Marketshare analytics & login notifications would require constant updates and maintenance.

And as @othermaciej mentioned, that would make it harder for browsers to emulate another's UA data (to avoid compatibility issues)