Sec-Ch-UA sequence should vary frequently

Background:

Recently one of Vivaldi's volunteer testers reported an issue with https://www.job-room.ch/ returning "Bad Request" HTTP errors in our Chromium 94-based builds (Vivaldi 4.3). It does not occur in our newer Chromium 96-based builds, but it will occur again for embedders in Chromium 100 (unless the site is fixed, or the Chromium side trigger is changed).

The root cause of this problem is likely a header parser error at the website.

The reason the error triggers is that in our 94-based builds the "sec-ch-ua" header value is

";Not A Brand";v="99", "Chromium";v="94"

In our 96-based builds the string is

" Not A;Brand";v="99", "Chromium";v="96"

The site is reacting to the leading ";" character in ";Not A Brand", and does so for any string that is placed first in the header with a leading semi-colon. Likely due to not correctly processing a quoted string.

We are currently going to switch the sequence of the values to work around the issue.

Further, for a browser specifying a brand, Chromium's current (Chromium 96) implementation would never send a Sec-Ch-UA header with the ";Not A Brand" first in the header (It would always be last in the header), for that variation the Brand (e.g. "Google Chrome" or "Edge") or "Chromium" would be the first entry.

Additionally, the variation does not change inside a given Chromium version, and the non-branded sequence will always be the one shown above for Chromium's Extended Stable versions.

Further Chromium specific discussion can be found in https://crbug.com/1266618

Discussion:

What happened here is that a web site incorrectly processed the header somehow, and did not respond with a valid response because it treated the read value as an error.

This error was deployed on the site at least two weeks ago, when the tester reported it.

A fundamental reason why this problem on the site has lasted as long as it have, is that 1) (Chromium-based) browsers are varying the header too infrequently (every 4 or 8 weeks), and 2) the major Branded Chromium-based browsers would never trigger the issue.

In order to bring such issues to light quickly, my suggestion is that clients must change the sequence of values non-deterministically (That is, don't use course data like the date or version as a seed) at least every time they start, maybe as frequently as every request.

Additionally, the client must be able to produce absolutely ALL combinations of their brand values and sequences. The current problem in Chromium is partly that it uses the same order sequence to both select the character variation in "Not A Brand", and to order the branded values, resulting in 6 different headers, not the 36 that might be possible with another way of randomization. For non-Branded, Extended Stabled Chromium, there are only 3 different headers, and the value sequence is static.

I would also suggest that it should be considered as a possible extra requirement that NOT sending the client's brand be one of the requirements for the randomization, mostly as an attempt to prevent sites from discriminating against clients that don't send a brand. (Although I suspect some persistent developers will store that info in a cookie session somewhere and still discriminate.)

Another argument favoring randomization every restart, or more frequently, is that once you start adding more combinations of characters or values to be included in the values or the header, then picking one variation every major release (which is what Chromium currently do) will result in the variant headers showing up every few, or even every dozens, of years.

If there is an error on the site that triggers on certain values, certain characters in values, or a certain ordering of values, then such variation among many clients should ensure that a sufficient number of users encounter the problem that it will (hopefully) be fixed quickly, quite possibly before the bug is deployed to production (especially if every request have a different combination of values).

Regarding adding a brand for Vivaldi: It is unlikely that Vivaldi will add its brand name to this header. We have had so many issues with major sites, e.g. Google, Microsoft, Facebook, Netflix, using the UA string to send us bad responses, that we no longer send Vivaldi in the UA string normally, and only send it to trusted sites. I think it is very likely that sites are going to abuse the Brands values in this header for the same purpose as they abused the US string, and that it once again will mostly affect the smaller browsers. In fact, given the more predictable format, I suspect it will be even more abused than the UA string has been.

Special handling of Known Problematic Sites (as was suggested by https://github.com/WICG/ua-client-hints/issues/52#issuecomment-599117400 ) will require massive effort to gather and analyze information, most likely by the smaller browser vendors with limited resources (NOT the major browsers; history shows that websites will go to great lengths to support them), as those like myself who worked on the Opera Presto engine knows only too well.

In Opera Presto we had a massive list of exceptions and special auto-updated JS and CSS code to handle broken sites, and before we changed the UA string policy in Vivaldi, we were starting to work up a major list of such special cases, too, and spent extensive (and expensive) time analyzing broken sites. See https://vivaldi.com/blog/user-agent-changes/ for more background.

And Known Sites is just one aspect of the potential problems, broken server software. or widely used (and seldom updated) frameworks, also cause significant problems used by thousands, if not millions, of websites, and their usage is frequently difficult to determine reliably.

Frequent randomization (since it seems to be required) of this header should at least limit the number of websites that break on specific syntaxes

Thanks for the issue @YngveNPettersen.

Currently the spec gives SHOULD level guidance in https://wicg.github.io/ua-client-hints/#create-ua-list-section

Randomize the order of the items in list.

It doesn't require any frequency of randomization, but the rationale for per-version variance relates to caching. If a site is going to Vary on Sec-CH-UA, and it changes per request, that seems like twice as many cache misses in Vivaldi (or more, if a browser sends its marketing brand in the brand list) - but the spec still allows for that.

Given that a UA that randomizes more or less frequently is still conforming, I'm not sure what changes to the spec might be desired here.

As mentioned, the main reason I bring up the point is that, based on our recent experience, the current shuffling implemented in Chromium both happens too infrequently (4 or 8 weeks apart), and the shuffling is not good enough (only 16% of possible permutations in branded clients, and never uses at least one sequence similar to what unbranded clients had to use), to be useful in uncovering breakage like the one we encountered.

As for caching, my thought would be that the caching should be keyed based on the actual value(s) used in the header, rather than the whole header. That should result in the same response being returned even if the the sequence is different.

Of course, that would not work in the case of forced removal of the brand in the string, but that is, admittedly, a bit extreme suggestion. My point about that, though, is to force websites to not discriminate unbranded clients, which is something I think will happen, it is just a question of time, unless they can't be sure if they would break the experience of users of "approved" clients.

As mentioned, the main reason I bring up the point is that, based on our recent experience, the current shuffling implemented in Chromium both happens too infrequently (4 or 8 weeks apart), and the shuffling is not good enough (only 16% of possible permutations in branded clients, and never uses at least one sequence similar to what unbranded clients had to use), to be useful in uncovering breakage like the one we encountered.

Not that I'm disagreeing, but are you suggesting that a more specific shuffling algorithm be specified? Anything to do with Chromium's implementation is outside the scope of this spec and this bug tracker.

As for caching, my thought would be that the caching should be keyed based on the actual value(s) used in the header, rather than the whole header. That should result in the same response being returned even if the the sequence is different.

This goes against both what HTTP and JS caching specs call for, which is the full and exact value of headers. Also, how would this work with the "fake" browser brands, if these were to constantly change as well? (Probably ignoring/dropping unknown brands, which is somewhat the point.)

I am not sure a specific algorithm need to be specified, but I think that maybe, if an implementation do perform shuffling, the spec should specify the minimum frequency of the shuffling (which I think should at least be each restart, and possibly more frequent) , as well as that the shuffling must choose between all possible permutations of the variations (Chromium, as an example of IMO badly implemented shuffling, does not do that for branded builds, only unbranded non-extended support builds).

As for caching, maybe shuffling (at least of the more aggressive kind) and caching are mutually exclusive for this header?

The main issue, as I said above, is that monthly (or less frequent) reshufflings of the header means that detecting website incompatibility can take a very long time (if ever, as for branded clients in this case ) to detect. The result is that the smaller browsers are at a significant disadvantage, since the web site operator is likely to say "Use another browser. Yours is not supported", unless they see that the issue also breaks every browser, including the "supported" ones, and that it is causing users to leave their site.

The main issue, as I said above, is that monthly (or less frequent) reshufflings of the header means that detecting website incompatibility can take a very long time (if ever, as for branded clients in this case ) to detect. The result is that the smaller browsers are at a significant disadvantage, since the web site operator is likely to say "Use another browser. Yours is not supported", unless they see that the issue also breaks every browser, including the "supported" ones, and that it is causing users to leave their site.

Thanks for the feedback, this is a good point.

WICG / ua-client-hints

Sec-Ch-UA sequence should vary frequently #274