Closed jdevalk closed 2 years ago
Related to #138
I was a little surprised to see a reference to the sitemap wasn't automatically added to robots.txt with a hook. What's the explanation behind why that is and is not a good idea? :-)
If it's a bad idea, then add an option for it.
for anyone that wants to quickly add this to their site:
function custom_robotstxt() {
echo 'Sitemap: ' . get_home_url() . '/sitemap_index.xml';
}
add_action('do_robots', 'custom_robotstxt');
Dear @retlehs !
That code snippet is really quick. Actually quicker than WordPress.
Please take a look at here. Please consider using the robots_txt
filter and
home_url( '/sitemap_index.xml' );
Clearing old milestone, adding to my sitemaps plate for upcoming cleanup of.
@jdevalk is this dependent on or independent from #138? What is good source to use for reason is/isn't good idea part?
Indepent of #138, though implementing both at the same time would have my preference.
Source for reason is a good question; the basic idea is that you're telling scrapers where your XML sitemap is, making it easier to scrape. You should add your XML sitemap to Google Search Console and then there's no need to have it in your robots.txt
.
You can feed scrapers with a nofollow
trap:
https://github.com/szepeviktor/wordpress-plugin-construction/tree/master/mu-nofollow-robot-trap
I am hesitant to introduce whole "option and explanation" for something so little. I doubt that not adding it is a significant barrier to bots. It's a fixed name root file, how hard can it be?
I would suggest a choice between:
My personal opinion would be (1).
Let's go with 1 since it can probably never hurt to have it. Created #5798 to deal with the GSC issue.
We also state here about adding sitemaps to robots.txt: https://yoast.com/wordpress-robots-txt-example/
We’ve always felt linking to your XML sitemap from your robots.txt is a bit nonsense. You should be adding them manually to your Google Search Console and Bing Webmaster Tools and make sure you look at their feedback about your XML sitemap. This is the reason our Yoast SEO plugin doesn’t add it to your robots.txt. Don’t rely on them to find out about your XML sitemap through your robots.txt .
Please inform the customer of conversation # 196841 when this conversation has been closed.
Seeing as the post from earlier this year that @Pcosta88 linked states we don't consider this necessary, it'd be nice to have some final decision on this issue.
@jdevalk Thoughts?
People are going to keep whining about it so let's do it.
Please inform the customer of conversation # 410303 when this conversation has been closed.
Please inform the customer of conversation # 414027 when this conversation has been closed.
If sitemap is part of the robots.txt standard and if robots.txt is for search engines and if Yoast wants to be the go to tool for SEO, this should have been implemented years ago.
Besides, seven years is a long time to wait.
Please inform the customer of conversation # 691294 when this conversation has been closed.
Bump. Please implement this. It is basic expected functionality. I am very surprised it has not been implemented.
Manually adding our sitemap to Google is not a valid workaround, since there are many other search engines and scrapers, new search engines may emerge at any time, and there are some major country and culture -specific search engines that we cannot manually submit sitemaps to from outside that country (e.g. Baidu), may find problematic doing so due to language barriers, or may simply be unaware of.
I doubt that not adding it is a significant barrier to bots. It's a fixed name root file, how hard can it be?
No, robots.txt
is a fixed name root file. Naming the sitemap sitemap.xml
is just a convention.
We’ve always felt linking to your XML sitemap from your robots.txt is a bit nonsense.
I couldn't disagree more. It is essential for the reasons I've given above.
I doubt that not adding it is a significant barrier to bots. It's a fixed name root file, how hard can it be?
No,
robots.txt
is a fixed name root file. Naming the sitemapsitemap.xml
is just a convention.[Adding the URL to
robots.txt
] is essential for the reasons I've given above.
Also, if a search engine does not know about the sitemap, it may not be able to index all pages on a site, particularly those linked in content loaded via AJAX, or which are arrived at through a form submission or JavaScript (e.g. after changing the value of a select
). Google used not to be able to find such content but now usually can. Other search engines may be still be lacking in this area.
for anyone that wants to quickly add this to their site:
function custom_robotstxt() { echo 'Sitemap: ' . get_home_url() . '/sitemap_index.xml'; } add_action('do_robots', 'custom_robotstxt');
Thanks. To add a bit of bulletproofing, I'd suggest the following instead:
\add_filter(
'robots_txt',
function($output) {
if (
!\method_exists(\WPSEO_Sitemaps_Router::class, 'get_base_url')
|| !\method_exists(\WPSEO_Options::class, 'get')
|| !\WPSEO_Options::get('enable_xml_sitemap')
|| \preg_match('/^Sitemap:/mi', $output)
) {
return $output;
}
return $output . "\n" . 'Sitemap: '
. \WPSEO_Sitemaps_Router::get_base_url('sitemap_index.xml') . "\n";
},
999
);
This additionally
Sitemap
directive (e.g. inserted by Yoast when this issue is hopefully finally resolved - though it is possible to have more than one, this would most likely be 100% duplication whether added by Yoast or another plugin);wpseo_sitemaps_base_url
filter.(I have used a PHP closure for brevity in this example though it is best practice to use a function or method for the filter callback.)
Using as it is, is not adding sitemap in robots.txt.
I checked again and it's working fine for me by just placing the code in the active theme's functions.php
. I'm using Yoast SEO 17.6 if it makes any difference. Note it doesn't add a Sitemap
entry if there already is one or if sitemaps are disabled at SEO > General > Features > XML sitemaps.
Bump. I'd like to see this too. I do manually submit to Search Console and Bing, but it doesn't hurt to add it anyway for the benefit of other less important search engines.
Internally opened: browse/P1-1369
Fixed in https://github.com/Yoast/wordpress-seo/pull/18445 (except in non-main multisite subdirectories, coming tomorrow in another PR 🤞 )
Plus an explanation of why that is and is not a good idea.
Please consider multisite when implementing this