Closed adriancooke closed 3 months ago
Thank you @adriancooke for this thoughtful input. @Narlotl and I discussed robots and we're going to revamp how we treat that (see #29). We want to ensure that sites are not actively blocking robots. Let us know if you have thoughts on this in #29. Feel free to copy/paste specific thoughts from above into the comments there.
Could you add a new issue for locale
so we can discuss that separately?
Closing this issue as part of it is a dupe and part of it will (hopefully) be created as a new issue.
@lukefretwell, @Narlotl thanks for following up. I created #103 for og:locale
and added a comment to #29.
Describe the bug Hi Luke, I learned about your tool and it’s report of new.nsf.gov today. Two of the elements your tool requires are absent, causing a lower grade, but it’s not clear how their presence would improve anything.
To Reproduce Steps to reproduce the behavior:
<meta name="robots">
and<meta property="og:locale">
Expected behavior Neither element should be required and their absence should not lower the score.
Desktop (please complete the following information):
Additional context
1. Robots meta The robots metadata element is needed when you want to block content from being indexed in public results, or if you want to fine-tune how SEs present it (e.g.
nosnippet
). However, the use cases are so contextual a blanket requirement does not seem to make sense.For example, if a public homepage such as new.nsf.gov contained
<meta name="robots" content="index, follow">
it would be treated the same way by search engines as it is now: it would be followed and indexed as this is the default behavior of search engines.But let’s say I did want to prevent the page from being indexed: it still doesn’t make sense to require the metadata element unless you’re also checking HTTP headers, because a functionally equivalent option is to return a HTTP header
X-Robots-Tag: otherbot: noindex, nofollow
in the server response. This is the only way you can tell a search engine not to index a non-HTML resource. So gov-metadata needs more information before it can conclude a site’s metadata is insufficient (i.e. the absence causes harm).For this check to be valid, it would need to be configurable against a specific intention, such as a checkbox for “Site should not be public and indexable” which, if checked, would return an error if the corresponding robots meta element or HTTP header was absent. What is the reason for requiring
<meta name="robots" content="index, follow">
to be present if that is what search engines already assume (whether we like it or not), because they have effectively decided that indexing is opt-out?2. OG locale Similarly, if
og:locale
is documented by the protocol authors as optional metadata, assuming a default ofen_US
, what is the rationale for reducing the site’s grade if it’s locale is in fact the US?Refs: Robots meta tag - Crawling directive Robots meta tag, data-nosnippet, and X-Robots-Tag specifications The Open Graph protocol