HermanMartinus / bearblog

Free, no-nonsense, super fast blogging.
MIT License
2.36k stars 73 forks source link

robots.txt file is malformed #277

Closed pimoore closed 3 months ago

pimoore commented 3 months ago

Lighthouse testing is showing the robots.txt file is malformed and not being downloaded, which would result in it not being applied at all. Here's the current format:

User-Agent: *
Sitemap: https://<domain_name>/sitemap.xml
Disallow: /signup/
Disallow: /accounts/
Disallow: /dashboard/
Disallow: /mothership/
Disallow: /studio/
Disallow: /public-analytics/
Disallow: /subscribe/
Disallow: /confirm-subscription/

I believe the Sitemap field should be last, as the convention expects the Disallow variables to immediately follow the User-Agent.

HermanMartinus commented 3 months ago

I did some looking into this and the only answers I got were:

The order of directives in a robots.txt file, including the placement of the Sitemap directive, generally does not affect the interpretation by compliant web crawlers.

The directive containing the sitemap location can be placed anywhere in the robots.txt file. It is independent of the user-agent line, so it does not matter where it is placed.

I suspect that robots.txt is working as expected, but I do agree that it's certainly more clear when it's not in an explicit disallow list for User-Agent: *.

I've just pushed an update.