Closed Mr0grog closed 4 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.
A site’s
robots.txt
file can list any number of sitemaps that search engines will generally read and provide extra weight towards in searches. As we consider updating our URL lists, it might be useful to seed the list with these (or maybe just monitor them and continually update our page list when new things get added).Not all sites have these (e.g. energy.gov), while others have many (e.g. epa.gov is kind of crazy). A simple example might be https://ferc.gov/robots.txt:
Which leads to the sitemap https://www.ferc.gov/sitemap.xml:
(Note this doesn’t obviate the need for tools like Walk, since these kind of sitemaps general don’t list every page, and not all sites have them.)