Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.77k stars 893 forks source link

[Feature Request] Exclude directories from sitemap. #7679

Closed rmarcano closed 7 years ago

rmarcano commented 7 years ago

We have received a request for a feature to exclude a subdirectory on the site from the sitemap. This feature is useful in those cases where a user may have certain parts of the site they do not want indexed. Perhaps using something like regexes?

rmarcano commented 7 years ago

Please inform the customer of conversation # 215016 when this conversation has been closed.

benvaassen commented 7 years ago

Excluding links from your sitemap doesn't prevent them from being indexed. Also, please use the template when creating a new issue.

agentinfinite commented 6 years ago

Wanted to follow up on this. Agreed that excluding links from the XML sitemap doesn't prevent indexing, but including them seems to confuse Google. Now that Google is sending out Index Coverage messages in Google Search Console, we realized that while we have a subdirectory blocked in our robots.txt, the pages are included in the XML Sitemap, which seems to be confusing them and causing the pages to be indexed. Any way we can reopen this request? Or has the feature been added already? I couldn't find it.

We also have URLs that are gated, which we don't want to be found easily in the XML sitemap.

benvaassen commented 6 years ago

@agentinfinite your pages are being indexed because the posts/pages are not set to noindex. Blocking them via your robots.txt doesn't prevent Google from adding them to their index. Excluding them from the sitemap won't prevent Google from adding them to their index.

agentinfinite commented 6 years ago

@benvaassen Sorry, should have followed up on this last night. You're right, they were not set to noindex and I thought they were. I know that robots.txt prevents crawling, not indexing, but it still sends conflicting directions to Google when it's disallowed in robots.txt but is in the XML sitemap. I believe John Mueller mentioned this in a recent Webmaster Hangout.

Either way, I still think this would be a useful feature. Like I mentioned earlier, for preventing people from finding gated URLs in the sitemaps, but also because now I have to manually noindex each page in this directory....well, I guess I'd have to do this anyway. And for some reason this directory doesn't show up as a custom post type in the Yoast plugin so I can't do a bulk noindex.

benvaassen commented 6 years ago

@agentinfinite If you have a custom taxonomy, you can exclude it by making the taxonomy public or by excluding it using a filter: https://kb.yoast.com/kb/how-to-customize-the-sitemap-index/#taxonomy. Manually excluding each post individually can be done as well.

OdinWynd commented 3 years ago

I think this is still a valid request for a legitimate problem.

Some plugins, like buddypress, can at times throw random items into the pages sitemap.

There is no corresponding page id or taxonomy do to some programming trickery used to make wordpress think it is loading a page.

Plugins that add features to buddypress further complicate the issue. For example, my shopping cart url is being indexed by Yoast.

I use robots.txt to secure this as well as some no-index code for buddypress pages, but since these links are blocked by robots.txt and the no-index, google throws a fit and says it can't read the sitemap. So out of my whole site, only 5 pages are properly indexed via sitemap.

It also causes the "Site down or unavailable error" for AdSense, Since it is in the sitemap, but blocked by robots.txt. AdSense uses the sitemap to validate. So if there is a link in the sitemap that is being no-indexed or blocked by robots.txt it is gives errors on google's end since it needs to validate all content in the sitemap.

Being able to remove directories, and/or being able to remove specific links that have no post id/category/tag/taxonomy from the sitemap manually is still a needed thing for some users.

CBR900cc commented 10 months ago

Hello, I really need this feature, anyone can tell me if it was implemented? I've not see into plugin anything about that

josevarghese commented 10 months ago

@CBR900cc, if you don't want a page to be shown on the sitemap, then you can set the page as noindex by following the steps mentioned within our help center . If the whole pages within the CPT or taxonomy to be removed from the sitemap, you can select the option within the Yoast SEO > Settings page > select the CPT or taxonomy > toggle off the Show XYZ in search results?

Otherwise, you can use the filter mentioned at our Yoast development portal..

CBR900cc commented 10 months ago

@josevarghese in my case, like @OdinWynd's case, I need to remove a www.website.com/plugins/ directory and all she childrens. I've not found any filter or function or option to do that. Is not possible to apply "noindex" to a folder

josevarghese commented 10 months ago

Hi @CBR900cc,

If the Yoast SEO sitemap shows a page within the sitemap, it can be noindexed by editing the page and by setting the CPT/Taxonomy as noindex. As you have mentioned about no-indexing the directories named plugins, we recommend you to create a Disallow: rule within the robots.txt file for that specific directory.

We use GitHub exclusively for well-documented bug reports or feature requests. We have the following support channels (if you need more help on this):

Thanks for understanding.