gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
76.11k stars 7.54k forks source link

Robots.txt should link to sitemap by default #4678

Open earthboundkid opened 6 years ago

earthboundkid commented 6 years ago

I propose changing the default robots.txt layout to:

User-agent: *
Sitemap: {{ .Sitemap.Filename | default "sitemap.xml" | absURL }}

This will link to sitemaps by default, which is the common usecase.

bep commented 6 years ago

I think this needs a little more thinking for the multilingual case, but I agree in principle.

ghost commented 6 years ago

How about List All Available Languages from the docs:

{{ range $.Site.Home.AllTranslations }}
Sitemap: {{ .Permalink }}
{{ end }}

This output method tweaked to output the sitemap files satisfies the suggestion from Moz:

It’s generally a best practice to indicate the location of any sitemaps associated with this domain at the bottom of the robots.txt file. Here’s an example:

image

Things to check:

According to Moz inclusion of the sitemap in the robots.txt file may not supported by all search engines. Sitemaps are also not defined in The Robots Exclusion Protocol. It seems quirky to use a file to assist a scraper or aggregator in a file designed to tell them not to crawl certain sections. Of course if you're a search giant who cares, right?

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

earthboundkid commented 6 years ago

Seems like a pretty simple feature.

FelicianoTech commented 6 years ago

I took a look at this. Currently, since robots.txt is a template file, it doesn't have access to all the information it would need to know what sitemaps are available to link to.

Whether it should point to a sitemap index file for multi-lingual sites yes, but otherwise there can be more than one sitemap file with Hugo's current implementation.

This information is gathered when generating the sitemap, but otherwise not available under .Sites currently. I assume we'd need to add that there. Problem is, the sitemap template (like this and the 404) templates are currently generated in isolation so... not sure what the best way to implement this would be.

comaldave commented 6 years ago

I suppose I have the choice of disabling the Hugo robots.txtand craft my own or replacing the template with something I craft that works with my clients that may not be appropriate for others. My clients are all multilingual so the suggestion by the OP is not quite right for me. I do think this is a worthwhile thing for Hugo to fix, most bloggers do not need to be messing with the robots.txt file. A change in the template and options in the config seem an ideal implementation for the average users.

hsn10 commented 5 years ago

.Sitemap.Filename doesnt work inside robots.txt

amrsoll commented 11 months ago

For a template that respects people who didn't add a sitemap, this can work too

User-agent: *

{{ with .Sitemap }}
Sitemap: {{ .Filename | default "sitemap.xml" | absURL }}
{{ end }}
ytrepidorosonomous commented 11 months ago

This works for multilingual sites

User-agent: *

Sitemap: {{ site.Home.Sitemap.Filename | absURL }}
Fethbita commented 3 months ago

For a template that respects people who didn't add a sitemap, this can work too

User-agent: *

{{ with .Sitemap }}
Sitemap: {{ .Filename | default "sitemap.xml" | absURL }}
{{ end }}

This doesn't work if Sitemap is disabled using disableKinds.