1bl4z3r / hermit-V2

Continuing Hermit's legacy to be minimal and fast theme
https://1bl4z3r.github.io/hermit-V2/
MIT License
79 stars 33 forks source link

[BUG] - 404 page should include a `noindex` meta #66

Closed xuhdev closed 4 months ago

xuhdev commented 5 months ago

Describe the bug

404 page should not be indexed by the search engine. Per https://developers.google.com/search/docs/crawling-indexing/block-indexing, this can be done by adding a noindex meta to the page.

Even though this page does not appear in sitemap.xml and search engine won't discover it in the first place, but if someone posts a misspelled link to the site somewhere else, search engines would inadvertently indexed the page through the link.

1bl4z3r commented 4 months ago

That is an excellent suggestion 👏 My thinking was that to block bots from crawling, one should add it to robots.txt.

Expect a change tomorrow to implement this

xuhdev commented 4 months ago

Yeah, while robots.txt can block the search engine from crawling the page, but if the page has been crawled and indexed before, blocking from robots.txt would actually make the search engines keep an outdated version of the page indexed forever. With a noindex meta tag, search engines would take the page down when they crawl it the next time.

Still good call, robots.txt is effective for new sites. :)

1bl4z3r commented 4 months ago

Yup yup. Agreed

1bl4z3r commented 4 months ago

Added no index to 404 page.

This is done through this bit of code in site-meta.html

{{- if (not (in .Page.RelPermalink "404")) -}}
<meta name="robots" content="index, follow">
<meta name="revisit-after" content="15 days">
{{- else -}}
<meta name="robots" content="noindex, nofollow, noarchive">
{{- end -}}