Use robots.txt to reduce search engine traffic to older documentation.

MESAHub / mesa

Modules for Experiments in Stellar Astrophysics

http://mesastar.org

GNU Lesser General Public License v2.1

138 stars 38 forks source link

Use robots.txt to reduce search engine traffic to older documentation. #692

Closed wmwolf closed 2 months ago

wmwolf commented 2 months ago

We should use robots.txt to get search engines to guide users preferentially to the latest version of our documentation. Within the website, old versions will be unchanged, but web crawlers will be asked not to index those pages. I think it could be as simple as

User-agent: *
Disallow: /
Allow: /en/latest

We'd just need to get a file called robots.txt at the root directory of docs.mesastar.org with these contents (I think), and with time, the web crawlers should update their indexes.

VincentVanlaer commented 2 months ago

I faintly remembered that there was some way to refer indexers to the latest version of a page. I did some searching, and it seems that the canonical link annotation can do this. From what I can tell this would be more effective, as robots.txt in the first place prevents crawling but not indexing. Meaning if someone links to an older version somewhere, these pages could still pop up. Although that's what I found from reading some stackexchange posts, so I don't know how correct this is.

pmocz commented 2 months ago

Looks like readthedocs autogenerates and serves a robots.txt files: https://docs.readthedocs.io/en/stable/guides/technical-docs-seo-guide.html#use-a-robots-txt-file

but it might not be what we want: https://docs.readthedocs.io/robots.txt

pmocz commented 2 months ago

I took @wmwolf 's suggestion and added a robots.txt file here: https://github.com/MESAHub/mesa/pull/694