Closed oraNod closed 7 months ago
Some additional information. The readthedocs project is in the "Active and Hidden" state: https://docs.readthedocs.io/en/stable/versions.html#hidden
This should prevent our test site from getting indexed. However we should also prevent the robots.txt
file that nikola generates from being copied to readthedocs. Need to send a PR for that.
The robots.txt file on readthedocs now uses the RTD settings to disallow the dev site: https://ansible-community-website.readthedocs.io/robots.txt
The robots.txt file that nikola generates does not disallow any content and uses default settings. The nikola robots.txt file does not get uploaded to readthedocs.
This issue follows up on #279 and #355 and #322
@wbentley15 has communicated a request that
robots.txt
is updated to keep bots off the ansible.com site. As part of this request it was mentioned to addmeta name="robots" content="none"
to HTML pages. This meta would be effective in keeping bots off the site, however, it would also result in pages not returning in search results as it would keep all bots off the site. Thecontent=none
directive is equivalent tonoindex
andnofollow
as per the documentation.I've contacted an expert on search within Red Hat who has confirmed the above and advised that we take the approach of filtering bots in
robots.txt
.To prevent a specific bot:
For example to disallow the user agent
scambot
from the entire site:To disallow the
identity_theft
bot from accessing certain directories of the site:To resolve this issue, we need to do the following before launch:
ROBOTS_EXCLUSIONS = ["*"]
in nikolaconf.py
to test and verify the resulting configuration.You can grab the
robots.txt
file on the dev site at: https://ansible-community-website.readthedocs.io/robots.txt