Open colinbendell opened 4 years ago
Would definitely be an interesting use-case for client hints, although I'm not sure if a new hint is a good way to go. This set of hints is really just about having feature parity with the User-Agent
header.
One problem is that you can't really trust these "I'm a good bot" type headers, because the bad bots will just lie like browsers lie about the User-Agent header now.
What would probably happen is that "HeadlessChrome" would show up in the Sec-CH-UA
list, so it might look like this:
Sec-CH-UA: "Chromium"; v="99", "HeadlessChrome"; v="99"
Good bots also have some kind of contact information in the UA string, typically a mail address or a HTTP URL (or both). It'd also be nice to be able to encode that in a principled way, since Client Hints are structured data anyway.
Good bots also have some kind of contact information in the UA string, typically a mail address or a HTTP URL (or both).
Huh, TIL. Do you have an example?
The principle here is to have a structured method for good-bots to follow rather than expecting an analytics service to have to innumerate all the ever-evolving possibilities. Yesterday you would key off of googlebot+phantomjs, now we add headless-chrome and wpt, what about tomorrow? This is, of course, all best effort and not a solution for 'bad bot' detection.
On Mon, Aug 17, 2020 at 9:21 AM Aaron Tagliaboschi notifications@github.com wrote:
Good bots also have some kind of contact information in the UA string, typically a mail address or a HTTP URL (or both).
Huh, TIL. Do you have an example?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/WICG/ua-client-hints/issues/119#issuecomment-674877533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMMERPTOW5SMZ7EXPC2TCDSBEVEJANCNFSM4P2SXNVA .
Huh, TIL. Do you have an example?
Googlebot: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Bingbot: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
It's a politeness thing, so that you can reach the operators if the bot's misbehaving or you have queries. Like @colinbendell said, this would be best effort and not intended for detection or anything like that; obviously, bad bots can lie.
Interesting. The hard part is that, in order for a hint to be sent it has to be requested, which means a site/server would have to know to ask for the contact info.
While I don't disagree that this would be super helpful in a post-dump-everything-in the-user-agent-header world, I don't know that Client Hints are the right mechanism. This would seem like it's own header
It is typical for many organizations to run headless browsers for periodic tests in production. This is done for many purposes including:
Since these services typically use webdriver or puppeteer to instantiate the headless browser and crawl web pages any beacons or analytics in production will be polluted with these crawls. Currently Google Analytics, Marketo and any other marketing analytics engine create sessions for all of these synthetic runs. For big companies this traffic can easily be ignored as rounding errors. However, for many smaller websites, this traffic can disproportionately skew the analytics.
For these known 'good' bots there should be a client hint that signals that this request is indeed a 'bot' and therefore the analytics and business metrics should be ignored or classified separately. Additionally, it should be enabled by default for these situations and shouldn't require any feature policy to incrementally reveal this attribute.