WICG / ua-client-hints

Wouldn't it be nice if `User-Agent` was a (set of) client hints?
https://wicg.github.io/ua-client-hints/
Other
590 stars 77 forks source link

I suggest not using ClientHint, but simply reducing the user agent #256

Closed sanchezzzhak closed 3 years ago

sanchezzzhak commented 3 years ago

I suggest not using ClientHint, but simply reducing the user agent for example, the current useragent: Mozilla/5.0 (Linux; Android 10; Model Name) AppleWebKit/537.36 (KHTML, like a gecko) Chrome/93.0.0.0 Mobile Safari/537.36

Remove Mozilla/5.0 Remove Linux; Remove AppleWebKit/537.36 Remove (KHTML, like a gecko) Remove Safari/537.36

As a result, we get a new format: Android WebView Android 10; Model Name; wv; Chrome/93.0.0.0; Blink; Mobile; (55 chars) Android mobile: Android 10; Model Name; Chrome/93.0.0.0; Blink; Mobile (53 chars) Android table/tv/pc Android 10; Model Name; Chrome/93.0.0.0; Blink; (45 chars) Windows: Windows 10; Chrome/93.0.0.0; Blink 93; (38 chars) Linux: Linux; Chrome/93.0.0.0; Blink (30 chars) IOS IOS 14_5; IPhone; Chrome/93.0.0.0; WebKit/537.36; Mobile (50 chars) IPadOS IPadOS 14_5; IPad; Chrome/93.0.0.0; WebKit/537.36 Mac MacOs14_5; Mac; Chrome/93.0.0.0; Blink

this is much less than with CH-hints and a bunch of headers.

PS my ClientHint "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92" (64 chars)

Originally posted by @sanchezzzhak in https://github.com/WICG/ua-client-hints/issues/200#issuecomment-918628431

UP. Yesterday I started trying to use CH for analytics, that's what I found, opportunity When uses CH headers, it becomes possible to identify the user this is the first visit or a repeat one.

example

const http = require('http');
const port = 3001;
const timeout = 3e5;

const prettyPrintJson = (obj) => JSON.stringify(obj, null, 2);
const parseHeaders = (headers = {}) => {
  const cleanValue = (str) => {
    return str.replace(/^"/, '').replace(/"$/, '');
  }
  const getAttr = (headers, prop, defaultValue = '') => {
    return headers[prop] ? cleanValue(headers[prop]) : defaultValue;
  }
  let isSupport = headers['sec-ch-ua'] !== void 0;
  if (!isSupport) {
    return {};
  }

  let browserHeaders = getAttr(headers, 'sec-ch-ua').split(', ');
  let browserName = /"([^"]+)"/i.exec(browserHeaders[browserHeaders.length - 1])[1];
  let browserVersion = getAttr(headers, 'sec-ch-ua-full-version');
  let upgradeHeader = browserVersion === '';
  if (upgradeHeader) {
  // send track fist visit user in site;
    return {
      upgradeHeader,
      client: {
        type: 'browser',
        name: browserName,
      }
    }
  }

  let osName = getAttr(headers, 'sec-ch-ua-platform');
  let osVersion = getAttr(headers, 'sec-ch-ua-platform-version');
  let platform = getAttr(headers, 'sec-ch-ua-arch');
  let isMobile = getAttr(headers, 'sec-ch-ua-mobile', '') === '?1';
  let deviceCode = getAttr(headers, 'sec-ch-ua-model', '');

  return {
    upgradeHeader,
    os: {
      name: osName,
      version: osVersion,
      platform
    },
    client: {
      name: browserName,
      version: browserVersion
    },
    device: {
      isMobile,
      code: deviceCode
    }
  }
}

const server = http.createServer(function onRequest(req, res) {
  let headers = {
    'Accept-CH': 'Sec-CH-UA-Full-Version, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version, Sec-CH-UA-Model, Sec-CH-UA-Arch',
  };
  console.log(req.headers, req.rawHeaders)
  res.writeHead(200, headers)
  res.end("success" + prettyPrintJson(parseHeaders(req.headers)));

});
server.listen({port, timeout}, (err, result) => {
  console.log('server listen port %s', port);
})

The example above is a typical scenario for an ads or tracker. I suggest reducing UA, since it is faster to implement it by changing one line in the code, well, maybe two.

Sora2455 commented 3 years ago

When uses CH headers, it becomes possible to identify the user this is the first visit or a repeat one.

Just noting that that is a CH concern, not just a UA-CH one. It's also possible to identify new visitors much more simply via cookies.

miketaylr commented 3 years ago

@sanchezzzhak my experience is that UA string changes break a lot of assumptions for developers, and content for users. So I don't think this proposal is compatible with the web at large. All of the bits you recommend removing exist for compat at some point in the history of a given browser.

But I agree removing all the legacy stuff would be a win, I just don't think it's possible. If you can demonstrate otherwise, I'd love to change my mind. :)

miketaylr commented 3 years ago

The example above is a typical scenario for an ads or tracker.

If this is a 3rd party ad or tracker, it will not work without explicit delegation from the 1P site to the 3P resource.

sanchezzzhak commented 3 years ago

I don't think that my proposal will break anything in the work of sites, I'm wondering if there are a couple of sites that use the header fragments (from list that I deleted.)?

How elegant is Firefox UA, nothing superfluous. Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Sora2455 commented 3 years ago

There is something superfluous in the Firefox UA, the "Mozilla/5.0" bit at the front. That's not a standard or anything, that's just there so that super-old servers (some of which might well be still active today) "feature detect" the mindbogglingly advanced technology of... frames.

See https://webaim.org/blog/user-agent-string-history/.

All of the impersonation text in UA strings is there for similar reasons, and discouraging that from being necessary going forward is why the GREASE behavior was added to UA_CH.

miketaylr commented 3 years ago

I don't think that my proposal will break anything in the work of sites,

I can guarantee that's not the case - given around 10 years of experience working on web compatibility issues in multiple browsers. If you'd like to convince yourself, you'll need to start testing UA detection libraries and frameworks and see what happens.

Here's an old, incomplete example of trying to see if the Firefox OS UA string was detectable as a mobile device (from 2014), https://miketaylr.github.io/arewedetectableyet/.

I'm wondering if there are a couple of sites that use the header fragments (from list that I deleted.)?

Sure, just look at: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml

miketaylr commented 3 years ago

Thanks for the suggestions @sanchezzzhak, perhaps one day in the future we can further prune the UA string, but for now we're aiming for maximum compatibility with existing systems.