Closed xuhdev closed 4 months ago
That is an excellent suggestion 👏 My thinking was that to block bots from crawling, one should add it to robots.txt.
Expect a change tomorrow to implement this
Yeah, while robots.txt
can block the search engine from crawling the page, but if the page has been crawled and indexed before, blocking from robots.txt
would actually make the search engines keep an outdated version of the page indexed forever. With a noindex
meta tag, search engines would take the page down when they crawl it the next time.
Still good call, robots.txt
is effective for new sites. :)
Yup yup. Agreed
Added no index to 404 page.
This is done through this bit of code in site-meta.html
{{- if (not (in .Page.RelPermalink "404")) -}}
<meta name="robots" content="index, follow">
<meta name="revisit-after" content="15 days">
{{- else -}}
<meta name="robots" content="noindex, nofollow, noarchive">
{{- end -}}
Describe the bug
404 page should not be indexed by the search engine. Per https://developers.google.com/search/docs/crawling-indexing/block-indexing, this can be done by adding a
noindex
meta to the page.Even though this page does not appear in
sitemap.xml
and search engine won't discover it in the first place, but if someone posts a misspelled link to the site somewhere else, search engines would inadvertently indexed the page through the link.