Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.76k stars 886 forks source link

XML Sitemap section should have option to add XML sitemap to robots.txt #139

Closed jdevalk closed 2 years ago

jdevalk commented 11 years ago

Plus an explanation of why that is and is not a good idea.

Please consider multisite when implementing this

jdevalk commented 11 years ago

Related to #138

jdub commented 10 years ago

I was a little surprised to see a reference to the sitemap wasn't automatically added to robots.txt with a hook. What's the explanation behind why that is and is not a good idea? :-)

szepeviktor commented 10 years ago

If it's a bad idea, then add an option for it.

retlehs commented 9 years ago

for anyone that wants to quickly add this to their site:

function custom_robotstxt() {
  echo 'Sitemap: ' . get_home_url() . '/sitemap_index.xml';
}
add_action('do_robots', 'custom_robotstxt');
szepeviktor commented 9 years ago

Dear @retlehs ! That code snippet is really quick. Actually quicker than WordPress. Please take a look at here. Please consider using the robots_txt filter and

home_url( '/sitemap_index.xml' );
Rarst commented 9 years ago

Clearing old milestone, adding to my sitemaps plate for upcoming cleanup of.

Rarst commented 8 years ago

@jdevalk is this dependent on or independent from #138? What is good source to use for reason is/isn't good idea part?

jdevalk commented 8 years ago

Indepent of #138, though implementing both at the same time would have my preference.

Source for reason is a good question; the basic idea is that you're telling scrapers where your XML sitemap is, making it easier to scrape. You should add your XML sitemap to Google Search Console and then there's no need to have it in your robots.txt.

szepeviktor commented 8 years ago

You can feed scrapers with a nofollow trap: https://github.com/szepeviktor/wordpress-plugin-construction/tree/master/mu-nofollow-robot-trap

Rarst commented 8 years ago

I am hesitant to introduce whole "option and explanation" for something so little. I doubt that not adding it is a significant barrier to bots. It's a fixed name root file, how hard can it be?

I would suggest a choice between:

  1. Just adding it.
  2. Adding it unless Search Console is set up.
  3. Adding option for it after all.

My personal opinion would be (1).

omarreiss commented 7 years ago

Let's go with 1 since it can probably never hurt to have it. Created #5798 to deal with the GSC issue.

Pcosta88 commented 7 years ago

We also state here about adding sitemaps to robots.txt: https://yoast.com/wordpress-robots-txt-example/

We’ve always felt linking to your XML sitemap from your robots.txt is a bit nonsense. You should be adding them manually to your Google Search Console and Bing Webmaster Tools and make sure you look at their feedback about your XML sitemap. This is the reason our Yoast SEO plugin doesn’t add it to your robots.txt. Don’t rely on them to find out about your XML sitemap through your robots.txt .

Pcosta88 commented 7 years ago

Please inform the customer of conversation # 196841 when this conversation has been closed.

jcomack commented 6 years ago

Seeing as the post from earlier this year that @Pcosta88 linked states we don't consider this necessary, it'd be nice to have some final decision on this issue.

@jdevalk Thoughts?

jdevalk commented 6 years ago

People are going to keep whining about it so let's do it.

Pcosta88 commented 6 years ago

Please inform the customer of conversation # 410303 when this conversation has been closed.

Pcosta88 commented 6 years ago

Please inform the customer of conversation # 414027 when this conversation has been closed.

mmikhan commented 3 years ago

+1 for this on https://wordpress.org/support/topic/sitemap-missing-in-the-robits-txt/

sarumbear commented 3 years ago

If sitemap is part of the robots.txt standard and if robots.txt is for search engines and if Yoast wants to be the go to tool for SEO, this should have been implemented years ago.

Besides, seven years is a long time to wait.

Pcosta88 commented 3 years ago

Please inform the customer of conversation # 691294 when this conversation has been closed.

JakeQZ commented 2 years ago

Bump. Please implement this. It is basic expected functionality. I am very surprised it has not been implemented.

Manually adding our sitemap to Google is not a valid workaround, since there are many other search engines and scrapers, new search engines may emerge at any time, and there are some major country and culture -specific search engines that we cannot manually submit sitemaps to from outside that country (e.g. Baidu), may find problematic doing so due to language barriers, or may simply be unaware of.

JakeQZ commented 2 years ago

I doubt that not adding it is a significant barrier to bots. It's a fixed name root file, how hard can it be?

No, robots.txt is a fixed name root file. Naming the sitemap sitemap.xml is just a convention.

We’ve always felt linking to your XML sitemap from your robots.txt is a bit nonsense.

I couldn't disagree more. It is essential for the reasons I've given above.

JakeQZ commented 2 years ago

I doubt that not adding it is a significant barrier to bots. It's a fixed name root file, how hard can it be?

No, robots.txt is a fixed name root file. Naming the sitemap sitemap.xml is just a convention.

[Adding the URL to robots.txt] is essential for the reasons I've given above.

Also, if a search engine does not know about the sitemap, it may not be able to index all pages on a site, particularly those linked in content loaded via AJAX, or which are arrived at through a form submission or JavaScript (e.g. after changing the value of a select). Google used not to be able to find such content but now usually can. Other search engines may be still be lacking in this area.

JakeQZ commented 2 years ago

for anyone that wants to quickly add this to their site:

function custom_robotstxt() {
  echo 'Sitemap: ' . get_home_url() . '/sitemap_index.xml';
}
add_action('do_robots', 'custom_robotstxt');

Thanks. To add a bit of bulletproofing, I'd suggest the following instead:

\add_filter(
  'robots_txt',
  function($output) {
    if (
      !\method_exists(\WPSEO_Sitemaps_Router::class, 'get_base_url')
      || !\method_exists(\WPSEO_Options::class, 'get')
      || !\WPSEO_Options::get('enable_xml_sitemap')
      || \preg_match('/^Sitemap:/mi', $output)
    ) {
      return $output;
    }
    return $output . "\n" . 'Sitemap: '
      . \WPSEO_Sitemaps_Router::get_base_url('sitemap_index.xml') . "\n";
  },
  999
);

This additionally

(I have used a PHP closure for brevity in this example though it is best practice to use a function or method for the filter callback.)

JakeQZ commented 2 years ago

Using as it is, is not adding sitemap in robots.txt.

I checked again and it's working fine for me by just placing the code in the active theme's functions.php. I'm using Yoast SEO 17.6 if it makes any difference. Note it doesn't add a Sitemap entry if there already is one or if sitemaps are disabled at SEO > General > Features > XML sitemaps.

JeePeeNL commented 2 years ago

Bump. I'd like to see this too. I do manually submit to Search Console and Bing, but it doesn't hurt to add it anyway for the benefit of other less important search engines.

mmikhan commented 2 years ago

Internally opened: browse/P1-1369

igorschoester commented 2 years ago

Fixed in https://github.com/Yoast/wordpress-seo/pull/18445 (except in non-main multisite subdirectories, coming tomorrow in another PR 🤞 )