WICG / client-hints-infrastructure

Specification for the Client Hints infrastructure - privacy preserving proactive content negotiation
https://wicg.github.io/client-hints-infrastructure
Other
61 stars 26 forks source link

Primary brand name for Sec-CH-UA and Sec-CH-UA-Full-Version-List #158

Open nicjansma opened 1 year ago

nicjansma commented 1 year ago

Hi everyone,

At Akamai, we utilize the User-Agent header for many reasons, ranging from making content delivery decisions to parsing it for reporting and analytics purposes.

As we are transitioning our code and infrastructure to utilize Client Hints instead, one challenge we're having is with interpreting the Sec-CH-UA and Sec-CH-UA-Full-Version-List headers. They contain a GREASEd list of many potential "brand names", without any indication of which one would be the "primary brand" (if having to choose just one).

For example, here are those two headers from my current Chrome:

Sec-CH-UA: "Google Chrome";v="113", "Chromium";v="113", "Not-A.Brand";v="24"
Sec-CH-UA-Full-Version-List: "Google Chrome";v="113.0.5672.63", "Chromium";v="113.0.5672.63", "Not-A.Brand";v="24.0.0.0"

These headers are of course GREASEd so there are multiple brands (and not-a-brands) listed:

Our challenge for reporting and analytics, is to understand which of these is the "primary" brand. In other words, if the browser itself wanted to be identified as one particular thing (in reports, charts, for marketing purposes, etc), what would it be? In the above case, for Chrome, I would assume that would be Google Chrome.

There have been a few prior discussions about this topic:

And the guidance here:

https://wicg.github.io/ua-client-hints/#marketshare-analytics-use-case

Which mentions:

This will necessitate regular updates to the list of known lists of brands when new browser versions are released or new browsers become popular, or else everything will get bucketed as an unknown browser… Such a list of known lists of brands could be maintained centrally…

Our primary use-cases for picking "one primary brand" are:

Given the headers' format today, we will likely need to maintain a list of "priority" (or "ignored") brands, and evaluate the headers against our list to determine which brand we want to report as the "most important" one. This list would need to be maintained and updated over time, with likely manual human intervention to review possible other "real" browsers that are new and should be reported on. We'd prefer to not have to maintain such a list :) And I assume there many other companies and products that will be facing this challenge as well.

Would there be a way, in the header, to indicate the "primary" brand for that UA? It would still send the others for site-compat (e.g. claiming to be "Google Chrome"), engine-compat ("Chromium") and GREASEing purposes ("Not;ABrand"), but couldn't one be indicated as "primary"?

An example for how the Chrome header would change to indicate ;primary:

Sec-CH-UA: "Google Chrome";v="113";primary, "Chromium";v="113", "Not-A.Brand";v="24"

And for other browsers, such as Edge:

// Edge today
Sec-CH-UA: "Not/A)Brand";v="99", "Microsoft Edge";v="115", "Chromium";v="115"

// Edge after
Sec-CH-UA: "Not/A)Brand";v="99", "Microsoft Edge";v="115";primary, "Chromium";v="115"

Vivaldi's current header is a result of https://github.com/WICG/ua-client-hints/issues/293, and doesn't even report Vivaldi today. I think with a ;primary attribute Vivaldi might consider reporting something like this instead:

// Vivaldi today
Sec-CH-UA: "[Not.A/Brand](http://not.a/Brand)";v="8", "Chromium";v="114", "Google Chrome";v="114"

// Vivaldi after
Sec-CH-UA: "[Not.A/Brand](http://not.a/Brand)";v="8", "Chromium";v="114", "Google Chrome";v="114", "Vivaldi";v="114;primary"

https://github.com/WICG/ua-client-hints/issues/115's concern about how new browsers identify themselves would also be addressed, because new browsers could indicate compatibility with Google Chrome and Chromium as well as marking their New Brand as ;primary for analytics purposes.

We've discussed this a bit with the Chrome team but would like to see if there are other companies that have a similar need.

Sora2455 commented 1 year ago

The reason Vivaldi is removing their brand name from Sec-CH-UA is because some site owners are writing code like:

if (brand I don't recognise is in brand list)
    block user agent;

Even though Vivaldi is in fact fully capable of using the site. In order to use the site, Vivaldi is forced to lie and pretend to be Google Chrome.

Your proposed change would make this situation worse, as now the bad sites just have to check if the primary brand is one they allow, rather than every one in the list. Every minor browser would be forced to lie about which brand was primary in order to access these sites - and no, a boycott of those sites is not viable. Most people will switch browsers rather than switch sites.

YngveNPettersen commented 1 year ago

@nicjansma I think you are misreading the situation, especially as regards Vivaldi.

Essentially, Vivaldi will NEVER display our brand to a site we don't control or have a contractual agreement with that involves a requirement to not discriminate (as a matter of fact, it is just a few months since we removed a partner from that list because they were discriminating us).

As detailed in https://github.com/WICG/ua-client-hints/issues/293 we have seen multiple cases of smaller and larger sites, including banking sites, discriminating against us based on not sending a Google Chrome or Microsoft Edge brand in the header. And we don't trust such sites to NOT discriminate if they locate an unknown brand in the header, which is why our default now is to send the Google Chrome brand, with only a couple of exceptions.

Regarding "primary", that would only be (marginally) useful if many clients start sending multiple well-known brands in the header, like developed in the original UA header. At present I don't see that happening, and I would not be surprised if the bad sites started blocking clients that send multiple well-known brands in the header.

In case multiple brands are used by some clients, my guess it that your stats engine could immediately decide that they are not Chrome or Edge (since I doubt they will start sending multiple brands), but may possible be one (or none) of the other brands in the header. And the primary field might still be lying.

You can read more about our position in my article Client Hints, or Client Lies

Bottom line: It is the "bad" sites that restrict what we can do (and what Client Hints can accomplish), not the "good" ones. And the "bad" ones are usually found on the list of sites that "MUST WORK!!"(TM), and there are at most four browsers that can make any of those behave.

nicjansma commented 1 year ago

@Sora2455 writes:

The reason Vivaldi is removing their brand name from Sec-CH-UA is because some site owners are writing code like:

if (brand I don't recognise is in brand list)
  block user agent;

Not exactly -- if they were doing that, they would fail on the new GREASEd brands every version (e.g. Not;A Brand).

With the example shared from @YngveNPettersen in https://github.com/WICG/ua-client-hints/issues/293 the logic is slightly more complex, as it's looking for a pre-approved list of brands:

function getBrowser(brands){
    const browser = brands.reduce((a, c) => c.brand.match(/Microsoft Edge|Google Chrome/) ? c : a);
    return browser;
}

i.e. looking for Microsoft Edge and Google Chrome specifically.

With the proposal of defining a ;primary brand, Vivaldi could choose to represent themselves as:

Sec-CH-UA: "[Not.A/Brand](http://not.a/Brand)";v="8", "Chromium";v="114", "Google Chrome";v="114", "Vivaldi";v="114;primary"

i.e.:

Your proposed change would make this situation worse, as now the bad sites just have to check if the primary brand is one they allow, rather than every one in the list

Yes, sites could use the primary denotation for evil. I would hope, given that they could inspect the brands list for engine and browser compatibility (e.g. searching for Google Chrome or even better Chromium), they would choose that rather than expecting a specific flavor of Chromium. At least this would give them the choice, and possibly stop Vivaldi from having to hide their branding.

nicjansma commented 1 year ago

@YngveNPettersen writes:

Essentially, Vivaldi will NEVER display our brand to a site we don't control or have a contractual agreement with that involves a requirement to not discriminate (as a matter of fact, it is just a few months since we removed a partner from that list because they were discriminating us).

Is Vivaldi not concerned about being properly represented in marketshare usage reports? I was hoping this proposal could allow Vivaldi to be better represented for markeshare, while still advertising engine/browser/brand compatibility with e.g. Chrome.

Regardless, it will certainly help for other analytics cases.

Sora2455 commented 1 year ago

Yes, sites could use the primary denotation for evil. I would hope, given that they could inspect the brands list for engine and browser compatibility (e.g. searching for Google Chrome or even better Chromium), they would choose that rather than expecting a specific flavor of Chromium. At least this would give them the choice, and possibly stop Vivaldi from having to hide their branding.

But that doesn't require this ;primary feature? All your suggested fix requires is for Vivaldi to claim to be Google Chrome and Vivaldi at the same time. Adding ;primary might fool the bad sites for a while... until they change their code to always fetch the ;primary brand and then we're back to square one. Much like how the Sec-CH-UA header did in the first place - it helped until bad sites 'updated' their code to bring their bad behaviours in line with the new standard.

@YngveNPettersen can speak with more authority on this, but I assume things aren't as simple as "just claim to be Chrome and yourself".

YngveNPettersen commented 1 year ago

Yes, it would be nice to be accurately represented in stats.

However, in the present situation making sure our users are not discriminated against is far more important to us. That was why we removed "Vivaldi" from the UA string, and why we are not using it in the Client Hints, except for pre-approved partner domains. Theoretically, independent stats collectors could be pre-approved, but they would have to do their collection at a specific host that we can change the header for. At least one major stats collector do their analysis in JS on the visited site, not a separate server controlled by them, which means they cannot be pre-approved.

As mentioned in my comment https://github.com/WICG/ua-client-hints/issues/293#issuecomment-1458276148 , at the time when we changed to sending the Chrome Brand, we had confirmed discrimination by companies such as PNC Bank and Nextdoor, and we suspected it was also happening (but was never able to confirm) on several other sites, including Wells Fargo.

IMNSHO only major worldwide legislation with extreme fines (and enforcement) can prevent discrimination of alternative clients.

BTW, the "Not a Brand" is very easy to recognize with a simple regexp, as is the "Chromium" brand sent by Chromium-based browsers. Then it is just a question of analyzing the other brands listed, whether they are on the "approved" list, and whether there are "non-approved" brands listed, and having multiple brands might easily be considered "non-approved", unless Google Chrome and Edge does it in a significant portion of their connections to a site (and does not vary on a given site).