OpenLiberty / openliberty.io

Open Liberty website
https://openliberty.io
Other
54 stars 40 forks source link

New configuration for Asciidoc pages to be excluded from search indexes #2663

Closed kinueng closed 2 years ago

kinueng commented 2 years ago

Related to https://github.com/OpenLiberty/blogs/issues/2269

Support a new custom front matter attribute that will add the <meta name="robots" content="noindex"> to the page's HTML.

The solution is based on https://developers.google.com/search/docs/advanced/crawling/block-indexing

Per Google's documentation (link) about how robots.txt is insufficient at excluding a page from being indexed,

A page that's disallowed in robots.txt can still be indexed if linked to from other sites. While Google won't crawl or index the content blocked by a robots.txt file, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google search results, password-protect the files on your server, use the noindex meta tag or response header, or remove the page entirely.

kinueng commented 2 years ago

My thoughts are we can add another if-condition {% if page.noindex %} to https://github.com/OpenLiberty/openliberty.io/blob/697ec60ab71dd412c468b10c860f97615c99a8b5/src/main/content/_includes/head.html#L59-L61.