Open earthboundkid opened 6 years ago
I think this needs a little more thinking for the multilingual case, but I agree in principle.
How about List All Available Languages from the docs:
{{ range $.Site.Home.AllTranslations }}
Sitemap: {{ .Permalink }}
{{ end }}
This output method tweaked to output the sitemap files satisfies the suggestion from Moz:
It’s generally a best practice to indicate the location of any sitemaps associated with this domain at the bottom of the robots.txt file. Here’s an example:
Things to check:
According to Moz inclusion of the sitemap in the robots.txt
file may not supported by all search engines. Sitemaps are also not defined in The Robots Exclusion Protocol. It seems quirky to use a file to assist a scraper or aggregator in a file designed to tell them not to crawl certain sections. Of course if you're a search giant who cares, right?
This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master
branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.
Seems like a pretty simple feature.
I took a look at this. Currently, since robots.txt is a template file, it doesn't have access to all the information it would need to know what sitemaps are available to link to.
Whether it should point to a sitemap index file for multi-lingual sites yes, but otherwise there can be more than one sitemap file with Hugo's current implementation.
This information is gathered when generating the sitemap, but otherwise not available under .Sites
currently. I assume we'd need to add that there. Problem is, the sitemap template (like this and the 404) templates are currently generated in isolation so... not sure what the best way to implement this would be.
I suppose I have the choice of disabling the Hugo robots.txt
and craft my own or replacing the template with something I craft that works with my clients that may not be appropriate for others. My clients are all multilingual so the suggestion by the OP is not quite right for me. I do think this is a worthwhile thing for Hugo to fix, most bloggers do not need to be messing with the robots.txt
file. A change in the template and options in the config seem an ideal implementation for the average users.
.Sitemap.Filename doesnt work inside robots.txt
For a template that respects people who didn't add a sitemap, this can work too
User-agent: *
{{ with .Sitemap }}
Sitemap: {{ .Filename | default "sitemap.xml" | absURL }}
{{ end }}
This works for multilingual sites
User-agent: *
Sitemap: {{ site.Home.Sitemap.Filename | absURL }}
For a template that respects people who didn't add a sitemap, this can work too
User-agent: * {{ with .Sitemap }} Sitemap: {{ .Filename | default "sitemap.xml" | absURL }} {{ end }}
This doesn't work if Sitemap is disabled using disableKinds
.
I propose changing the default robots.txt layout to:
This will link to sitemaps by default, which is the common usecase.