Open PathogenDavid opened 3 months ago
Having a robots.txt
makes complete sense, I just never got to dive into how it works properly 🙂
One thing I didn't really think about when writing this is that the main website's robots.txt
is what actually matters since the docs repo is nested in a subdirectory.
(Similarly for forks, the robots.txt
in the GitHub Pages website of the user or the organization associated with the fork is what actually matters.)
This means we actually probably just go the route of adding <meta name="robots" content="noindex, nofollow">
tags to the <head>
of every page instead.
It can be convenient in forks to enable deployment to GitHub Pages for the purposes of testing. However this inadvertently creates duplicate copies of the documentation accessible on the wider public internet, which means search engines have the potential to find them.
This runs the risk of polluting search results with content which is likely outdated. I believe it also runs the risk of harming the SEO of the official documentation website. (I'm no SEO expert but my understanding is Google in particular harshly penalizes websites which duplicate other websites.)
We should
generate aadd the appropriate meta tags to non-canonical copies of the docs website.robots.txt
and/orAs a semi-related aside (since you specify it in the
robots.txt
), we should also enable thesitemap.xml
generation. Looks like it just needs to be turned on.