WICG / ua-client-hints

Wouldn't it be nice if `User-Agent` was a (set of) client hints?
https://wicg.github.io/ua-client-hints/
Other
591 stars 77 forks source link

User-Agent Client Hints #252

Closed designkidtoronto closed 3 years ago

designkidtoronto commented 3 years ago

Draft Community Group Report, 30 June 2021

This version: https://wicg.github.io/ua-client-hints/

Editors: Mike Taylor (Google LLC) Yoav Weiss (Google LLC)

Former Editor: Mike West (Google LLC)

Participate:

Copyright © 2021 the Contributors to the User-Agent Client Hints Specification, published by the Web Platform Incubator Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.

Abstract This document defines a set of Client Hints that aim to provide developers with the ability to perform agent-based content negotiation when necessary, while avoiding the historical baggage and passive fingerprinting surface exposed by the venerable User-Agent header.

Status of this document This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

  1. Introduction Today, user agents generally identify themselves to servers by sending a User-Agent HTTP request header field along with each request (defined in Section 5.5.3 of [RFC7231]). Ideally, this header would give servers the ability to perform content negotiation, sending down exactly those bits that best represent the requested resource in a given user agent, optimizing both bandwidth and user experience. In practice, however, this header’s value exposes far more information about the user’s device than seems appropriate as a default, on the one hand, and intentionally obscures the true user agent in order to bypass misguided server-side heuristics, on the other.

For example, a recent version of Chrome on iOS identifies itself as:

User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/69.0.3497.105 Mobile/15E148 Safari/605.1

While a recent version of Edge identifies itself as:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.2704.79 Safari/537.36 Edge/18.014 There’s quite a bit of information packed into those strings (along with a fair number of lies). Version numbers, platform details, model information, etc. are all broadcast along with every request, and form the basis for fingerprinting schemes of all sorts. Individual vendors have taken stabs at altering their user agent strings, and have run into a few categories of feedback from developers that have stymied historical approaches:

Brand and version information (e.g. "Chrome 69") allows websites to work around known bugs in specific releases that aren’t otherwise detectable. For example, implementations of Content Security Policy have varied wildly between vendors, and it’s difficult to know what policy to send in an HTTP response without knowing what browser is responsible for its parsing and execution.

Developers will often negotiate what content to send based on the user agent and platform. Some application frameworks, for instance, will style an application on iOS differently from the same application on Android in order to match each platform’s aesthetic and design patterns.

Similarly to #1, OS revisions and architecture can be responsible for specific bugs which can be worked around in website’s code, and narrowly useful for things like selecting appropriate executables for download (32 vs 64 bit, ARM vs Intel, etc).

Sophisticated developers use model/make to tailor their sites to the capabilities of the device (e.g. [FacebookYearClass]) and to pinpoint performance bugs and regressions which sometimes are specific to model/make.

This document proposes a mechanism which might allow user agents to be a bit more aggressive about removing entropy from the User-Agent string generally by giving servers that really need some specific details about the client the ability to opt-into receiving them. It introduces a number of new Client Hints ([RFC8942]) that can provide the client’s branding and version information, the underlying operating system’s branding and major version, as well as details about the underlying device. Rather than broadcasting this data to everyone, all the time, user agents can make reasonable decisions about how to respond to given sites' requests for more granular data, reducing the passive fingerprinting surface area exposed to the network (see Best Practice 1 in [FINGERPRINTING-GUIDANCE]).

1.1. Examples A user navigates to https://example.com/ for the first time using the latest version of the "Examplary Browser". Their user agent sends the following headers along with the HTTP request:

Sec-CH-UA: "Examplary Browser"; v="73", ";Not?A.Brand"; v="27" Sec-CH-UA-Mobile: ?0 Sec-CH-UA-Platform: "Windows" The server is interested in rendering content consistent with the user’s underlying platform version, and asks for a little more information by sending an Accept-CH header (Section 2.2.1 of [RFC8942]) along with the initial response:

Accept-CH: Sec-CH-UA-Platform-Version In response, the user agent includes the platform version information in the next request:

Sec-CH-UA: "Examplary Browser"; v="73", ";Not?A.Brand"; v="27" Sec-CH-UA-Mobile: ?0 Sec-CH-UA-Platform: "Windows" Sec-CH-UA-Full-Version: "14.0.0" 1.2. Use Cases This section attempts to document the current uses for the User-Agent string, and how similar functionality could be enabled using User-Agent Client Hints (UA-CH).

1.2.1. Differential serving 1.2.1.1. Based on browser features This use case enables services like polyfill.io to serve custom-tailored polyfills to their users, without bloating up the experience of modern browser users. Similarly, when serving Javascript to users, one can avoid transpilation (which can result in bloat and inefficient code) for browsers that support the latest ES features that were used. Finally, when serving images, some browsers don’t update their Accept request headers, while in other cases the MIME type is not descriptive enough to distinguish between different variants of the same format (e.g., WebP). In those cases, knowing the browser and its version can be critical to serving the right image variant.

For that use case to work, the server needs to be aware of the browser and its meaningful version, and map that to a list of available features. That enables it to know which polyfill or code variant to serve.

Services that wish to do that using UA-CH will need to inspect the Sec-CH-UA header, that is sent by default on every request, and modify their response based on that.

1.2.1.2. Browser bug workaround Some browser versions have well-known bugs which require content to workaround them. Triggering those bugs can result in browser crashes, content breakage and other issues, and those bugs are by definition not something that can be feature detected. Therefore, content needs to avoid them altogether for affected browser versions. For that use case, servers need to be aware of the browser and its meaningful version, be aware of browser bugs that impact them, and apply workarounds if the current browser version is impacted.

Services that wish to do that using UA-CH will need to inspect the Sec-CH-UA header, sent by default on every request, and use it to modify their response.

1.2.2. Marketshare Analytics A browser’s market share can be extremely important. Having visibility into a browser’s usage can encourage developers to test in that particular browser, ensuring fewer compatibility issues for its users. On top of that, a browser’s market share can have a direct impact on the browser vendors' business goals, ensuring future development of the browser. For market share analytics to work, the server needs to be aware of the browser and its meaningful version, in order to be able to register them and find their relative market shares.

Sites that wish to provide market share analytics using UA-CH will need to inspect the Sec-CH-UA header, that is sent by default on every request, and keep a record of it.

By design, looking at individual entries in the brands list makes it hard to distinguish between a less-popular browser’s truthful brand name and a more-popular browser’s arbitrary GREASE. Since the less-popular browser may include several popular brand names for compatibility purposes, its users will likely be bucketed as using the more-popular one if this approach is taken, leading to distorted views of usage share that favour already-popular browsers and with less-popular browsers possibly never gaining any visibility.

Hence, for analytics purposes, it is better to treat the brands list as a unit, and compare it to known lists of brands sent by the various (browser, version) pairs that are to be distinguished. This will necessitate regular updates to the list of known lists of brands when new browser versions are released or new browsers become popular, or else everything will get bucketed as an unknown browser. However, as this doesn’t break sites for users, failing closed for unknown browsers is acceptable in this context.

Such a list of known lists of brands could be maintained centrally and used by many sites (as, e.g., browser feature support is maintained by caniuse and MDN, and consumed by many webmasters).

The specification recommends that browsers fix the brands list they send per version to make counting usage shares simpler (and also to help with caching), so the known lists of brands can be a simple list mapping from a set of brands to a (browser, version) pair.

1.2.3. Content adaptation Content adaptation is ensuring that users get content that’s tailored to their needs. There are many dimensions to content adaptation beyond the UA string: viewport dimensions, device memory, user preferences and more. This sub-section covers content adaptation needs that rely on information that is part of the current User-Agent string. 1.2.3.1. Browser based adaptation Some sites choose to serve slightly different content to different browsers. The reasons for that vary. Some reasons are legitimate (e.g. wanting to serve different experiences to different browsers due to their feature support). Other reasons are slightly less legitimate (e.g. warning users that the site’s developers haven’t tested in their browser). And then there are reasons which are outright wrong (e.g. Willingness to block certain browsers' users from accessing the site). As browsers, we want to enable the former, while discouraging the latter.

1.2.3.2. Mobile specific site Many site owners serve different content between mobile and desktop sites. While responsive web design has made it possible to serve multiple form factors using a single code base, there are still cases where serving a mobile-specific version can be better adapted. For those cases, serving mobile-specific sites to users on mobile devices can be helpful. For that to work, the server needs to be aware, at HTML serving time, whether the user is on a mobile device or not.

Sites that wish to serve mobile-specific sites using UA-CH can do that using the Sec-CH-UA-Mobile headers that are sent by default on every request.

1.2.3.3. Low-powered devices Some sites serve different content to low powered devices that cannot deal with CPU intensive tasks, large video and images, etc. Such content adaptation typically uses the device model information that’s integrated in the current User-Agent string for that purpose, relying on server-side databases to convert device models into memory, CPU power, and other categories on which they want to split their content. If the dimension on which the split is made is memory, the Device-Memory Client Hint can be used to make that distinction. Otherwise, with UA-CH, sites can still retrieve the device model by opting in to the Sec-CH-UA-Model hint.

Both of these hints are not sent by default, so require some extra work.

Top-level origins will need to send Accept-CH: Device-Memory, Sec-CH-UA-Model headers with their responses to opt-in to receiving those hints. In cases where they absolutely need to perform that adaptation on every navigation request, a redirect would be required here in the case where the hints are not present in a browser that supports them. Alternatively, they might use Critical-CH to have the client handle the additional request/response roundtrip.

Third-party origins that need to perform such adaptation would need delegation from the top-level origin. The top-level origin would need to opt-in using Accept-CH, as well as add Permissions-Policy headers that delegate those hints to the third-party origin.

1.2.3.4. OS specific styles Some sites may wish to tailor their interfaces to match the user’s OS. While progressive enhancement is likely to be a better path here (e.g. through the application of different button styles using script), there may be cases where folks would wish to deliver tailored inline styles based on the platform and platform version. Those cases are very similar to the case discussed above (in "Low-powered devices"), only with the Sec-CH-UA-Platform and Sec-CH-UA-Platform-Version hints.

1.2.3.5. OS integration Similarly, some sites would want to change links to OS specific ones (e.g. Android intent links). While, again, progressive enhancement can be used to modify those links using script, rather than bake them into the HTML, some sites may prefer server-side adaptation. Again, like the "OS specific styles" case, they’d need to use the Sec-CH-UA-Platform and Sec-CH-UA-Platform-Version hints to do so.

1.2.3.6. Browser and OS specific experiments Some servers may like to limit their multi variant experimentation to specific browsers, specific platforms or specific versions of any of the above. For experiments that are limited to browser and version, those sites can use the Sec-CH-UA values sent by default on requests. If they require the platform and its version, they could use the default Sec-CH-UA-Platform hint but would have to request the Sec-CH-UA-Platform-Version hint, or use client-side scripts to control the experimentation. 1.2.4. User login notification Many sites, especially security sensitive ones, like to notify their users when a log-in from a new device happens. That enables users to be aware of those logins, and take action in case it’s not a login that’s done by them or on their behalf. For those notifications to be meaningful, sites need to recognize and communicate the commercial brand of the browser to the user. These messages often also include the platform and its version in order to make sure the user knows which device is in question.

Since such messaging doesn’t require any server-side adaptation, it’s better for this case to use the userAgentData.getHighEntropyData() method in order to retrieve the required information.

1.2.5. Download of appropriate binary executables Some sites are used to download binary executables of native applications, and need to be able to propose the right binary to the user by default. The right binary executable for the current user depends on a few factors: their operating system, its version, its bitness, as well as their CPU architecture. In order to tackle that use case, download sites can opt-in to receive the Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version, Sec-CH-UA-Architecture, and Sec-CH-UA-Bitness hints (or query them through the API), in order to ensure the right binary is offered to the user by default.

1.2.6. Conversion modeling Some machine learning models use various details from the User-Agent string in order to estimate various things about users of those user agents. Similar modeling would still be possible, but will require explicit opt-in to collect the required bits of information. 1.2.7. Vulnerability filtering In some environments, proxy servers may be used to verify that the different users accessing information are not doing so from obsolete devices that are potentially vulnerable to security issues. While the browser and version information available from Sec-CH-UA can provide some information, the browser and OS full version are often useful for that kind of analysis. Such proxies would have to add a redirect step, or use one of the two Client Hint reliability mechanisms that opts-in to getting the browser full version and the platform version in order to continue to get access to those hints.

1.2.8. Logs and debugging Many services log the User-Agent string today and can use it in various ways when analyzing past traffic or when trying to debug errors related to their service. Those services will have to use the lower entropy values available through Sec-CH-UA for logging purposes, or opt-in to receive higher-entropy hints. The latter doesn’t seem like something services should do just for forensic purposes. On the other hand, when specific issues are encountered, it may make sense for those services to opt-in to receive more details on the user agent, or use the userAgentData.getHighEntropyData() API for that purpose. 1.2.9. Fingerprinting User fingerprinting is the practice of gathering multiple bits of user information from multiple sources and intersecting them together to create a unique signature of the user, that would enable to recognize them to be recognized later on, even if they clear state from their browsers (e.g. by deleting cookies).

For those cases, the origin needs to gather as much entropy as possible, so it is likely to collect all the hints.

1.2.9.1. Spam filtering and bot detection This is a case of fingerprinting that is not user-hostile, and therefore one we would like to preserve. With UA-CH this will be initially enabled by active collection of the various hints. We hope that alternative methods or APIs will exist to address the spam filtering and bot detection use cases in the future, as browsers may decide to intervene on behalf of their users by limiting the collection of user-identifying entropy (e.g., the Privacy Budget proposal).

1.2.9.2. Persistent user tracking This is a case of fingerprinting that this proposal explicitly tries to make harder. Like the case of "spam filtering", it would still be feasible to actively collect all the hints about the user as bits of entropy. Unlike the above case, this is something that proposals such as the Privacy Budget aim to prevent, without providing any alternative mechanisms for persistent user tracking. 1.2.9.3. Blocking known bots and crawlers Currently, the User-Agent string is often used as a brute-force way to block known bots and crawlers. There’s a concern that moving "normal" traffic to expose less entropy by default will also make it easier for bots to hide in the crowd. While there’s some truth to that, that’s not enough reason for making the crowd be more personally identifiable. Similar to the spam filtering case, there’s hope that alternative methods would be able to replace User-Agent string matching for this use case.

  1. Infrastructure This specification depends on Client Hints Infrastructure, HTTP Client Hints, and the Infra Standard. [CLIENT-HINTS-INFRASTRUCTURE] [RFC8942] [INFRA]

Some of the terms used in this specification are defined in Structured Field Values for HTTP. [RFC8941]

  1. User Agent Hints The following sections define a number of HTTP request header fields that expose detail about a given user agent, which servers can opt-into receiving via the Client Hints infrastructure defined in [RFC8942]. The definitions below assume that each user agent has defined a number of properties for itself:

brand - The user agent's commercial name (e.g., "cURL", "Edge", "The World’s Best Web Browser")

significant version - The user agent's marketing version, which includes distinguishable web-exposed features (e.g., "72", "3", or "12.1")

full version - The user agent's build version (e.g., "72.0.3245.12", "3.14159", or "297.70E04154A")

platform brand - The user agent's operating system’s commercial name. (e.g., "Windows", "iOS", or "AmazingOS")

platform version - The user agent's operating system’s version. (e.g., "NT 6.0", "15", or "17G")

platform architecture - The user agent's underlying CPU architecture (e.g., "ARM", or "x86")

platform bitness - The user agent's underlying CPU architecture bitness (e.g., "32" or "64")

model - The user agent's device model (e.g., "", or "Pixel 2 XL")

mobileness - A boolean indicating if the user agent's device is a mobile device. (e.g., ?0 or ?1)

User agents SHOULD keep these strings short and to the point, but servers MUST accept arbitrary values for each, as they are all values constructed at the user agent's whim.

User agents MUST map higher-entropy platform architecture values to the following buckets:

x86 CPU architectures => "x86"

ARM CPU architectures => "arm"

Other CPU architectures could be mapped into one of these values in case that makes sense, or be mapped to the empty string.

User agents SHOULD return the empty string or a fictitious value for platform architecture or platform bitness unless the user’s platform is one where both the following conditions apply:

Binary download of executables is likely.

Different CPU architectures are likely to require different binary executable resources, and different binary executable resources are likely to be available.

User Agents MUST return the empty string for model if mobileness is false. User Agents MUST return the empty string for model even if mobileness is true, except on platforms where the model is typically exposed.

User agents MAY return the empty string or a fictitious value for full version, platform architecture, platform bitness or model, for privacy, compatibility, or other reasons.

3.1. The 'Sec-CH-UA' Header Field The Sec-CH-UA request header field gives a server information about a user agent's branding and version. It is a Structured Header whose value MUST be a list [RFC8941]. The list’s items MUST be string. The value of each item SHOULD include a "v" parameter, indicating the user agent's version.

The header’s ABNF is:

Sec-CH-UA = sf-list To return the Sec-CH-UA value for a request, user agents MUST:

Let list be a list, initially empty.

For each brandVersion in brands:

Let parameter be a dictionary, initially empty.

Set parameter["param_name"] to "v".

Set parameter["param_value"] to brandVersion’s version.

Let pair be a tuple comprised of brandVersion’s brand and parameter.

Append pair to list.

Return the output of running serializing a list with list as input.

Note: Unlike most Client Hints, since it’s included in the low entropy hint table, the Sec-CH-UA header will be sent by default, whether or not the server opted-into receiving the header via an Accept-CH header (although it can still be controlled by it’s policy controlled client hints feature. It is considered low entropy because it includes only the user agent's branding information, and the significant version number (both of which are fairly clearly sniffable by "examining the structure of other headers and by testing for the availability and semantics of the features introduced or modified between releases of a particular browser" [Janc2014]).

3.2. The 'Sec-CH-UA-Arch' Header Field The Sec-CH-UA-Arch request header field gives a server information about the architecture of the platform on which a given user agent is executing. It is a Structured Header whose value MUST be a string [RFC8941].

The header’s ABNF is:

Sec-CH-UA-Arch = sf-string 3.3. The 'Sec-CH-UA-Bitness' Header Field The Sec-CH-UA-Bitness request header field gives a server information about the bitness of the architecture of the platform on which a given user agent is executing. It is a Structured Header whose value MUST be a string [RFC8941].

The header’s ABNF is:

Sec-CH-UA-Bitness = sf-string 3.4. The 'Sec-CH-UA-Full-Version' Header Field The Sec-CH-UA-Full-Version request header field gives a server information about the user agent’s full version. It is a Structured Header whose value MUST be a string [RFC8941].

The header’s ABNF is:

Sec-CH-UA-Full-Version = sf-string 3.5. The 'Sec-CH-UA-Mobile' Header Field The Sec-CH-UA-Mobile request header field gives a server information about whether or not a user agent prefers a "mobile" user experience. It is a Structured Header whose value MUST be a boolean [RFC8941].

The header’s ABNF is:

Sec-CH-UA-Mobile = sf-boolean Note: Like Sec-CH-UA above, since it’s included in the low entropy hint table, the Sec-CH-UA-Mobile header will be sent by default, whether or not the server opted-into receiving the header via an Accept-CH header (although it can still be controlled by its policy controlled client hints feature). It is considered low entropy because it is a single bit of information directly controllable by the user.

3.6. The 'Sec-CH-UA-Model' Header Field The Sec-CH-UA-Model request header field gives a server information about the device on which a given user agent is executing. It is a Structured Header whose value MUST be a string [RFC8941].

The header’s ABNF is:

Sec-CH-UA-Model = sf-string 3.7. The 'Sec-CH-UA-Platform' Header Field The Sec-CH-UA-Platform request header field gives a server information about the platform on which a given user agent is executing. It is a Structured Header whose value MUST be a string [RFC8941]. Its value SHOULD match one of the following common platform values: "Android", "Chrome OS", "iOS", "Linux", "macOS", "Windows", or "Unknown".

The header’s ABNF is:

Sec-CH-UA-Platform = sf-string Note: Like Sec-CH-UA above, since it’s included in the low entropy hint table, the Sec-CH-UA-Platform header will be sent by default, whether or not the server opted-into receiving the header via an Accept-CH header (although it can still be controlled by its policy controlled client hints feature).

3.8. The 'Sec-CH-UA-Platform-Version' Header Field The Sec-CH-UA-Platform-Version request header field gives a server information about the platform version on which a given user agent is executing. It is a Structured Header whose value MUST be a string [RFC8941].

User agents SHOULD return a version with a platform-specific format that allows differentiating significant platform versions by running these steps:

Let platformVersionComponentList be a list.

Use per-platform logic:

If the platform is Android:

Let platformReturnedVersionString be the result of querying the OS’s android.os.Build.VERSION.RELEASE string.

Let platformVersionComponentList be the result of running parse a platform-returned version string with platformReturnedVersionString.

If the platform is iOS:

Let platformReturnedVersionString be the result of querying the UIDevice returned by currentDevice and reading its systemVersion.

Let platformVersionComponentList be the result of running parse a platform-returned version string with platformReturnedVersionString.

If the platform is Linux:

Let platformReturnedVersionString be the result of querying the release string in the utsname struct returned by the uname API.

Let platformVersionComponentList be the result of running parse a platform-returned version string with platformReturnedVersionString.

If the platform is macOS:

Let platformReturnedVersionString be the result of querying the NSProcessInfo returned by processInfo and reading its operatingSystemVersion.

Append the majorVersion, minorVersion, and patchVersion components (in that order) to platformVersionComponentList.

If the platform is Windows:

If available (i.e., on Windows 10 or higher), let platformReturnedVersionString be the result of querying the Windows.Foundation.UniversalApiContract integer version and converting it to a string. Otherwise, let platformReturnedVersionString be "0".

Let platformVersionComponentList be the result of running parse a platform-returned version string with platformReturnedVersionString.

Otherwise:

Append one to three version parts based on the a format likely to lead to interoperability with other browsers running on that platform to platformVersionComponentList.

While platformVersionComponentList’s length is less than 3, append "0" to platformVersionComponentList.

Return the result of the concatenation of platformVersionComponentList with a U+002E FULL STOP (.) separator.

The parse a platform-returned version string algorithm, given a string input, runs these steps:

Let platformVersionComponentList be a list and index be 0.

Let platformVersionUnprocessedTokenList be the list returned by strictly splitting input on the U+002E FULL STOP character (.):

While index is less than 3:

If index is less than the length of platformVersionUnprocessedTokenList:

If platformVersionUnprocessedTokenList[index] is an unsigned integer, convert it to a string and append it to platformVersionComponentList.

Otherwise, append "0" to platformVersionComponentList.

Otherwise, if index is greater than or equal to the length of platformVersionUnprocessedTokenList:

Append "0" to platformVersionComponentList.

Increment index by 1.

Return platformVersionComponentList.

The header’s ABNF is:

Sec-CH-UA-Platform-Version = sf-string Note: These client hints can be evoked with the following set of client hints tokens: Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version

  1. Interface dictionary NavigatorUABrandVersion { DOMString brand; DOMString version; };

dictionary UADataValues { sequence brands; boolean mobile; DOMString platform; DOMString architecture; DOMString bitness; DOMString model; DOMString platformVersion; DOMString uaFullVersion; };

dictionary UALowEntropyJSON { sequence brands; boolean mobile; DOMString platform; };

[Exposed=(Window,Worker)] interface NavigatorUAData { readonly attribute FrozenArray brands; readonly attribute boolean mobile; readonly attribute DOMString platform; Promise getHighEntropyValues(sequence hints); UALowEntropyJSON toJSON(); };

interface mixin NavigatorUA { [SecureContext] readonly attribute NavigatorUAData userAgentData; };

Navigator includes NavigatorUA; WorkerNavigator includes NavigatorUA;

Note: The high-entropy portions of the user agent information are retrieved through a Promise, in order to give user agents the opportunity to gate their exposure behind potentially time-consuming checks (e.g. by asking the user for their permission).

4.1. Processing model 4.1.1. WindowOrWorkerGlobalScope Each user agent has an associated brands, which is a list created by running create brands.

Every WindowOrWorkerGlobalScope object has an associated brands frozen array, which is a FrozenArray. It is initially the result of creating a frozen array from the user agent's brands.

4.1.2. Create brands When asked to run the create brands algorithm, the user agent MUST run the following steps:

Let list be a list.

Collect pairs of brand and significant version which represent the user agent or equivalence classes.

For each pair:

Let dict be a new NavigatorUABrandVersion dictionary, with brand as brand and significant version as version.

Append dict to list.

The user agent SHOULD execute the following steps:

Append one additional item to list containing a NavigatorUABrandVersion dictionary, initialized with arbitrary brand and arbitrary version combinations.

Randomize the order of the items in list.

Note: One approach to minimize caching variance when generating these random components could be to determine them at build time, and keep them identical throughout the lifetime of the user agent's significant version.

Note: See § 6.2 GREASE-like UA Brand Lists for more details on when and why these randomization steps might be appropriate.

Return list.

An equivalence class represents a group of browsers believed to be compatibile with each other. A shared rendering engine may form an equivalence class, for example.

4.1.3. Create arbitrary brand and version values To create an arbitrary brand, the user agent MUST run these steps:

Let arbitraryBrand be a string composed of ASCII alpha. arbitraryBrand MUST contain one or more 0x20 (SP) bytes and be no longer than twenty ASCII bytes.

Let arbitraryBrandList be the result of splitting arbitraryBrand on ASCII whitespace.

Let greaseyStack be a stack.

Let greaseyChars be the list of ASCII bytes « 0x20 (SP), 0x28 (left parenthesis), 0x29 (right parenthesis), 0x2D (-), 0x2E (.), 0x2F (/), 0x3A (:), 0x3B (;), 0x3D (=), 0x3F (?), 0x5F (_) ».

For each item of arbitraryBrandList, push a randomly selected item from greaseyChars onto greaseyStack.

Let greaseyBrandList be a list and index be 0.

While greaseyStack is not empty:

Let item be the result of popping from greaseyStack.

Append item to greaseyBrandList.

Append arbitraryBrandList[index] to greaseyBrandList.

Increment index by 1.

Return the result of stripping leading and trailing ASCII whitespace from the concatenation of greaseyBrandList (with no separator).

Note: Structured Headers allows for escaped 0x22 (\") and 0x5C (\) inside a string, but these are known to not be web-compatible.

To create an arbitrary version, return a string that MUST match the format of the user agent's significant version, but MUST NOT match the value.

Note: User Agents may decide to send arbitrarily low versions to ensure proper version checking, and should vary them over time.

4.1.4. Getters On getting, the brands attribute MUST return this's relevant global object's brands frozen array.

On getting, the mobile attribute must return the user agent's mobileness.

On getting, the platform attribute must return the user agent's platform brand.

4.1.5. getHighEntropyValues method The getHighEntropyValues(hints) method MUST run these steps:

Let p be a a new promise created in the current realm.

If the user agent decides one or more values in hints should not be returned, then reject and return p with a "NotAllowedError".

We can improve upon when and why a UA decides to refuse a hint once Issue #151 is resolved.

Otherwise, run the following steps in parallel:

Let uaData be a new UADataValues.

set uaData["brands"] to this's relevant global object's brands frozen array.

set uaData["mobile"] to the user agent's mobileness.

set uaData["platform"] to the user agent's platform brand.

If hints contains "architecture", set uaData["architecture"] to the user agent's platform architecture.

If hints contains "bitness", set uaData["bitness"] to the user agent's platform bitness.

If hints contains "model", set uaData["model"] to the user agent's model.

If hints contains "platformVersion", set uaData["platformVersion"] to the user agent's platform version.

If hints contains "uaFullVersion", let uaData["uaFullVersion"] be the the user agent’s full version.

Queue a task on the permission task source to resolve p with uaData.

Return p.

4.1.6. toJSON method The toJSON() method MUST run these steps:

Let uaLowEntropyData be a new UALowEntropyJSON

set uaLowEntropyData["brands"] to this's relevant global object's brands frozen array.

set uaLowEntropyData["mobile"] to the user agent's mobileness.

set uaLowEntropyData["platform"] to the user agent's platform brand.

Return uaLowEntropyData

  1. Security and Privacy Considerations 5.1. Secure Transport Client Hints will not be delivered to non-secure endpoints (see the secure transport requirements in Section 2.2.1 of [RFC8941]). This means that user agent information will not be leaked over plaintext channels, reducing the opportunity for network attackers to build a profile of a given agent’s behavior over time.

5.2. Delegation Client Hints will be delegated from top-level pages via Permissions Policy. This reduces the likelihood that user agent information will be delivered along with subresource requests, which reduces the potential for passive fingerprinting.

That delegation is defined as part of append client hints to request.

5.3. Fingerprinting The primary goal of User Agent Client Hints is to reduce the default entropy available to a server for passive fingerprinting. However, it will still be possible for some, or all, hints to be requested and used for active fingerprinting purposes by first or delegated third parties. As noted in § 5.4 Access Restrictions, User agents should consider policies to restrict or reduce access to parties that are known to actively fingerprint their users.

5.4. Access Restrictions The information in the Client Hints defined above reveals quite a bit of information about the user agent and the device upon which it runs. User agents ought to exercise judgement before granting access to this information, and MAY impose restrictions above and beyond the secure transport and delegation requirements noted above. For instance, user agents could choose to reveal platform architecture or platform bitness only on requests it intends to download, giving the server the opportunity to serve the right binary. Likewise, they could offer users control over the values revealed to servers, or gate access on explicit user interaction via a permission prompt or via a settings interface.

  1. Implementation Considerations 6.1. The 'User-Agent' Header User agents SHOULD deprecate usage of the User-Agent header by reducing its information granularity or removing the header entirely, in favor of the Client Hints model described in this document. The header, however, is likely to be impossible to remove entirely in the near-term, as existing sites' content negotiation code will continue to require its presence (see [Rossi2015] for a recent example of a new browser’s struggles in this area).

One approach which might be advisable could be for each user agent to lock the value of its User-Agent header, ensuring backwards compatibility by maintaining the crufty declarations of "like Gecko" and "AppleWebKit/537.36" on into eternity. This can ratchet over time, first freezing the version number, then shifting platform and model information to something reasonably generic in order to reduce the fingerprint the header provides.

6.2. GREASE-like UA Brand Lists History has shown us that there are real incentives for user agents to lie about their branding in order to thread the needle of sites' sniffing scripts, and prevent their users from being blocked by UA-based allow/block lists.

Resetting expectations may help to prevent abuse of the brands list in the short term, but probably won’t help in the long run. The world of network protocols introduced the notion of GREASE [I-D.ietf-tls-grease]. We could borrow from that concept to tackle this problem.

User agents' brands containing more than a single entry could encourage standardized processing of the brands list. By randomly including additional, intentionally incorrect, comma-separated entries with arbitrary ordering, they would reduce the chance that we ossify on a few required strings.

Let’s examine a few examples:

In order to avoid sites from barring unknown browsers from their allow lists, Chrome could send a UA set that includes an non-existent browser, and which varies once in a while.

"Chrome"; v="73", "(Not;Browser"; v="12"

In order to enable equivalence classes based on Chromium versions, Chrome could add the rendering engine and its version to that.

"Chrome"; v="73", "(Not;Browser"; v="12", "Chromium"; v="73"

In order to encourage sites to rely on equivalence classes based on Chromium versions rather than exact UA sniffing, Chrome might remove itself from the set entirely.

"(Not;Browser"; v="12", Chromium"; v="73"

Browsers based on Chromium may use a similar UA string, but use their own brand as part of the set, enabling sites to count them.

"Chrome"; v="73", "Xwebs mega"; v="60", "Chromium"; v="73", "(Not;Browser"; v="12"

User agents MUST include more than a single value in brands, where one of these values is an arbitrary value.

The value order in brands MUST change over time to prevent receivers of the header from relying on certain values being in certain locations in the list.

When choosing GREASE strategies, user agents SHOULD keep caching variance and analytics use cases in mind and minimize variance among identical user agent versions.

Note: One approach to minimize variance for caching and analytics could be to determine the GREASE parts of the UA set at build time, and keep them identical throughout the lifetime of the user agent's significant version.

6.3. The 'Sec-CH-' prefix Restricting user-land JavaScript code from influencing and modifying UA-CH headers has various security related advantages. At the same time, there don’t seem to be any legitimate use-cases which require such user-land rewriting.

As such and based on discussions with the TAG, it seems reasonable to forbid write access to these headers from JavaScript (e.g. through fetch or Service Workers), and demarcate them as browser-controlled client hints so they can be documented and included in requests without triggering CORS preflights.

Therefore, request headers defined in this specification include a Sec-CH- prefix.

  1. IANA Considerations This document intends to define the Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform, and the Sec-CH-UA-Platform-Version HTTP request header fields, and register them in the permanent message header field registry ([RFC3864]).

It also intends to deprecate usage of the User-Agent header field.

7.1. 'Sec-CH-UA' Header Field Header field name: Sec-CH-UA

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.1 The 'Sec-CH-UA' Header Field)

7.2. 'Sec-CH-UA-Arch' Header Field Header field name: Sec-CH-UA-Arch

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.2 The 'Sec-CH-UA-Arch' Header Field)

7.3. 'Sec-CH-UA-Bitness' Header Field Header field name: Sec-CH-UA-Bitness

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.3 The 'Sec-CH-UA-Bitness' Header Field)

7.4. 'Sec-CH-UA-Full-Version' Header Field Header field name: Sec-CH-UA-Full-Version

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.4 The 'Sec-CH-UA-Full-Version' Header Field)

7.5. 'Sec-CH-UA-Mobile' Header Field Header field name: Sec-CH-UA-Mobile

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.5 The 'Sec-CH-UA-Mobile' Header Field)

7.6. 'Sec-CH-UA-Model' Header Field Header field name: Sec-CH-UA-Model

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.6 The 'Sec-CH-UA-Model' Header Field)

7.7. 'Sec-CH-UA-Platform' Header Field Header field name: Sec-CH-UA-Platform

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.7 The 'Sec-CH-UA-Platform' Header Field)

7.8. 'Sec-CH-UA-Platform-Version' Header Field Header field name: Sec-CH-UA-Platform-Version

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document: this specification (§ 3.8 The 'Sec-CH-UA-Platform-Version' Header Field)

7.9. 'User-Agent' Header Field Header field name: User-Agent

Applicable protocol: http

Status: deprecated

Author/Change controller: IETF

Specification document: this specification (§ 6.1 The 'User-Agent' Header), and Section 5.5.3 of [RFC7231]

  1. Acknowledgments Thanks to Aaron Tagliaboschi, ArkUmbra, Erik Anderson, jasonwee, Luke Williams, Mike West, and Toru Kobayashi for valuable feedback and contributions to this specification.

Index Terms defined by this specification architecture, in §4 bitness, in §4 brand dfn for user agent, in §3 dict-member for NavigatorUABrandVersion, in §4 brands attribute for NavigatorUAData, in §4 dfn for user agent, in §4.1.1 dict-member for UADataValues, in §4 dict-member for UALowEntropyJSON, in §4 brands frozen array, in §4.1.1 create an arbitrary brand, in §4.1.3 create an arbitrary version, in §4.1.3 create brands, in §4.1.2 equivalence class, in §4.1.2 full version, in §3 getHighEntropyValues(hints) method for NavigatorUA, in §4.1.5 method for NavigatorUAData, in §4 mobile attribute for NavigatorUAData, in §4 dict-member for UADataValues, in §4 dict-member for UALowEntropyJSON, in §4 mobileness, in §3 model dfn for user agent, in §3 dict-member for UADataValues, in §4 NavigatorUA, in §4 NavigatorUABrandVersion, in §4 NavigatorUAData, in §4 parse a platform-returned version string, in §3.8 platform attribute for NavigatorUAData, in §4 dict-member for UADataValues, in §4 dict-member for UALowEntropyJSON, in §4 platform architecture, in §3 platform bitness, in §3 platform brand, in §3 platform version, in §3 platformVersion, in §4 return the Sec-CH-UA value for a request, in §3.1 Sec-CH-UA, in §3.1 Sec-CH-UA-Arch, in §3.2 Sec-CH-UA-Bitness, in §3.3 Sec-CH-UA-Full-Version, in §3.4 Sec-CH-UA-Mobile, in §3.5 Sec-CH-UA-Model, in §3.6 Sec-CH-UA-Platform, in §3.7 Sec-CH-UA-Platform-Version, in §3.8 set-ua, in §3.1 significant version, in §3 toJSON() method for NavigatorUA, in §4.1.6 method for NavigatorUAData, in §4 UADataValues, in §4 uaFullVersion, in §4 UALowEntropyJSON, in §4 userAgentData, in §4 version, in §4 Terms defined by reference [HTML] defines the following terms: Navigator WindowOrWorkerGlobalScope WorkerNavigator in parallel queue a task relevant global object [INFRA] defines the following terms: append ascii alpha ascii byte concatenation contain is not empty item list pop push split on ascii whitespace stack strictly split a string string strip leading and trailing ascii whitespace user agent [permissions] defines the following terms: permission task source [RFC8941] defines the following terms: boolean list serializing a list string structured header [WebIDL] defines the following terms: DOMString Exposed FrozenArray NotAllowedError Promise SecureContext a new promise boolean create a frozen array dictionary reject resolve sequence this References Normative References [CLIENT-HINTS-INFRASTRUCTURE] Client Hints Infrastructure. cg-draft. URL: https://wicg.github.io/client-hints-infrastructure/ [HTML] Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/ [INFRA] Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/ [PERMISSIONS] Mounir Lamouri; Marcos Caceres; Jeffrey Yasskin. Permissions. URL: https://w3c.github.io/permissions/ [RFC8941] M. Nottingham; P-H. Kamp. Structured Field Values for HTTP. February 2021. Proposed Standard. URL: https://datatracker.ietf.org/doc/html/rfc8941 [RFC8942] I. Grigorik; Y. Weiss. HTTP Client Hints. February 2021. Experimental. URL: https://datatracker.ietf.org/doc/html/rfc8942 [WebIDL] Boris Zbarsky. Web IDL. URL: https://heycam.github.io/webidl/ Informative References [FacebookYearClass] Chris Marra; Daniel Weaver. Year class: A classification system for Android. URL: https://engineering.fb.com/android/year-class-a-classification-system-for-android/ [FINGERPRINTING-GUIDANCE] Nick Doty. Mitigating Browser Fingerprinting in Web Specifications. 28 March 2019. NOTE. URL: https://www.w3.org/TR/fingerprinting-guidance/ [I-D.ietf-tls-grease] David Benjamin. Applying GREASE to TLS Extensibility. ID. URL: https://tools.ietf.org/html/draft-ietf-tls-grease [Janc2014] Artur Janc; Michal Zalweski. Technical analysis of client identification mechanisms. URL: https://dev.chromium.org/Home/chromium-security/client-identification-mechanisms#TOC-Browser-level-fingerprints [RFC3864] G. Klyne; M. Nottingham; J. Mogul. Registration Procedures for Message Header Fields. September 2004. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc3864 [RFC7231] R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. June 2014. Proposed Standard. URL: https://httpwg.org/specs/rfc7231.html [Rossi2015] The Microsoft Edge Rendering Engine that makes the Web just work. URL: https://channel9.msdn.com/Events/WebPlatformSummit/2015/The-Microsoft-Edge-Rendering-Engine-that-makes-the-Web-just-work#time=9m45s IDL Index dictionary NavigatorUABrandVersion { DOMString brand; DOMString version; };

dictionary UADataValues { sequence brands; boolean mobile; DOMString platform; DOMString architecture; DOMString bitness; DOMString model; DOMString platformVersion; DOMString uaFullVersion; };

dictionary UALowEntropyJSON { sequence brands; boolean mobile; DOMString platform; };

[Exposed=(Window,Worker)] interface NavigatorUAData { readonly attribute FrozenArray brands; readonly attribute boolean mobile; readonly attribute DOMString platform; Promise getHighEntropyValues(sequence hints); UALowEntropyJSON toJSON(); };

interface mixin NavigatorUA { [SecureContext] readonly attribute NavigatorUAData userAgentData; };

Navigator includes NavigatorUA; WorkerNavigator includes NavigatorUA;

Issues Index

miketaylr commented 3 years ago

Thanks for the spec backup.