WICG / ua-client-hints

Wouldn't it be nice if `User-Agent` was a (set of) client hints?
https://wicg.github.io/ua-client-hints/
Other
590 stars 77 forks source link

Wording clarification needed for low-entropy `architecture` values #257

Closed natevw closed 3 years ago

natevw commented 3 years ago

The ua-client-hints draft currently published from this repo says in the https://wicg.github.io/ua-client-hints/#http-ua-hints section:

User agents MUST map higher-entropy platform architecture values to the following buckets:

x86 CPU architectures => "x86"

ARM CPU architectures => "arm"

Other CPU architectures could be mapped into one of these values in case that makes sense, or be mapped to the empty string.

This MUST was surprising to me in a standards document.

I get the intent for browser implementers, to prevent too fine-grained of identification of a user.

But as a web developer, to me this language also implies that I can and should write code that handles these three values ("x86", "arm", and "") and those three values only.

That seems rather short-sighted. What happens if ten years from now some new architecture, call it CISC-C for example, has captured 75% of the market? Are implementers really going to still be mapping CISC-C to one of the values which they MUST map it to?

In fact, in prior discussion of this a collaborator on the repo makes the claim:

Nothing prevents this list from being expanded in the future, […]

If that is truly the intent, this should be anticipated in the language of the standard itself.

As it sits, something does prevent the list from being expanded: the specification told me that browsers MUST return one of exactly three strings here, full stop. If you want to leave the door open to a new string like our hypothetical market-shifting "ciscC" processors then please incorporate that possibility into the specification.

miketaylr commented 3 years ago

But as a web developer, to me this language also implies that I can and should write code that handles these three values ("x86", "arm", and "") and those three values only.

But in the present, there aren't any other useful values for you to write code for, right?

As it sits, something does prevent the list from being expanded: the specification told me that browsers MUST return one of exactly three strings here, full stop.

Not quite - we can just update the spec and file bugs on browser implementations to follow the spec. This isn't uncommon - new formats emerge, and specs can be updated to handle them. (e.g., https://drafts.csswg.org/css-fonts/#color-font-technology-values)

natevw commented 3 years ago

we can just update the spec […]

Really?

You tell me now as a server/JS developer that this field, if present, will only ever have one of three particular values (anything else is a non-conformant browser — so I'd be wise to handle buggy/malicious values somehow but I'd be well within my rights to simply send a 400 Bad Request status back).

Then, surprise, there's a new spec. Same as the old spec except where the old spec swore up and down that anything not "x86" or "arm" or "" must be mapped to one of those anyway, now there's a different set of values because progress. Anyone digging around in the bug tracker knew that was actually the plan all along, but that's not what the spec said, and now your new spec contradicts the old spec.

[…] and file bugs on browser implementations to follow the spec.

That's not the problem.

Your plan isn't so bad for browser implementers, but it's a broken promise to web developers (e.g. HTTP servers or browser-oriented apps) that build on the current spec!

Consider that some languages enforce "complete coverage" for switch cases. Let's say I write a webserver in Swift. Based on the wording of the current specification I'd be within my rights to hardcode this as a closed enumeration:

enum BroadArchitecture: String {
  case likeIntel = "x86"
  case likeARM = "arm"
  case other = ""
}

// later in some request handler
guard let arch = BroadArchitecture(rawValue: headerVal) else {
  throw Http400("Bad browser, didn't you read the spec??")
}
// … otherwise, use `arch` …

The current spec says a user agent MUST send one of those three values [if any] and so there's no reason for me to handle anything else. A future spec would need to account for that promise.

Or, the current spec should see it coming and account for it! I don't think this is a huge change to the wording:

  1. Soften MUST map higher-entropy platform architecture values to SHOULD map …. Then in nine years when x86 is an obscure platform and ciscC is how everyone is browsing the web from their ocular implants, browser vendors can make a judgment call and change what they consider "low-entropy" without violating spec.
  2. Add some hedging that "the following list" is not meant for always and eternity but rather based on current status quo and likely to change in future specifications. Then in ten years when a new spec catches up with the implementers, it won't conflict with the old spec which web server/app implementations had been relying on in the proceeding decade.
miketaylr commented 3 years ago

Hi @natevw,

I really don't understand your concern, or how it's unique to UA-CH. I don't plan on trying to account for all possible future values that may or may not exist. If you wrote your server that way, and your users were on a new arch, you would probably get some support tickets and update it. Until then, a 400 sounds reasonable, depending on what you do with arch.

If browsers start to support this theoretical architecture, we'll update the spec. This is pretty much how all web platform specs work.

natevw commented 3 years ago

The concern is simply that you are writing the spec to make it sound like the list will never change, while the behind-the-scenes intent is to change the list whenever necessary.

Seems like you're thinking how display: block/inline got extended to include flex, or a CSS \ can now be rebeccapurple, or now there's a new <canvas /> HTML element, or whatever. But this specification here isn't worded to be open-ended like that. It's written rather the opposite, more like how XMLHttpRequest.readyState gives a table and says "these are the only values this can have". Any change to that table would be done only very cautiously and carefully considering the impact on backwards compatibility, or maybe just not at all.

Here you're already planning to "update" the table whenever and however necessary. So please explain that in the spec and not just buried here in one of the backchannels.

miketaylr commented 3 years ago

None of the specs you invoke are written to be open ended; specs generally aren't. The idea is you define the set of requirements that are relevant for user agents to implement today.

For example, Flexbox: CSS display is defined w/ all possible valid value types: https://drafts.csswg.org/css-display-3/#the-display-properties - each of those types, such as <display-inside>, has a grammar with all possible valid values : https://drafts.csswg.org/css-display-3/#typedef-display-inside. flexbox is one of them today. Before that, it wasn't a valid value because it didn't exist: https://drafts.csswg.org/css2/#display-prop.

Same for rebeccapurple. https://www.w3.org/TR/css-color-4/#named-colors states:

The following table defines all of the opaque named colors, by giving equivalent numeric specifications in the other color syntaxes.

It did not exist in https://www.w3.org/TR/css-color-3/, etc.

Here you're already planning to "update" the table whenever and however necessary.

No, I'm discussing with you about your hypothetical example of a CPU architecture that does not exist, and has overtaken x86 and arm. That's not a plan, because no such thing exists today.

I think this thread has reached it's conclusion, but thanks for opening the issue and thanks for the feedback.